Measurement of differential t ¯ t production cross sections using top quarks at large transverse momenta in pp collisions at ﬃﬃ s p = 13 TeV

A measurement is reported of differential top quark pair ( t ¯ t ) production cross sections, where top quarks are produced at large transverse momenta. The data collected with the CMS detector at the LHC are from pp collisions at a center-of-mass energy of 13 TeV corresponding to an integrated luminosity of 35 . 9 fb − 1 . The measurement uses events where at least one top quark decays as t → Wb → q ¯ q 0 b and is reconstructed as a large-radius jet with transverse momentum in excess of 400 GeV. The second top quark is required to decay either in a similar way or leptonically, as inferred from a reconstructed electron or muon, a bottom quark jet, and missing transverse momentum due to the undetected neutrino. The cross section is extracted as a function of kinematic variables of individual top quarks or of the t ¯ t system. The results are presented at the particle level, within a region of phase space close to that of the experimental acceptance, and at the parton level and are compared to various theoretical models. In both decay channels, the observed absolute cross sections are significantly lower than the predictions from theory, while the normalized differential measurements are well described.


I. INTRODUCTION
The top quark completes the third generation of quarks in the standard model (SM), and a precise understanding of its properties is critical for the overall consistency of the theory. Measurements of the top quark-antiquark pair (tt) production cross section confront the expectations from QCD but could also be sensitive to effects of physics beyond the SM. In particular, tt production constitutes a dominant SM background to many direct searches for beyond-the-SM phenomena, and its detailed characterization is therefore important for confirming possible discoveries.
The large tt yield expected in pp collisions at the CERN LHC enables measurements of the tt production rate as functions of kinematic variables of individual top quarks and the tt system. Such measurements have been performed at the ATLAS [1-9] and CMS [10][11][12][13][14][15][16][17][18][19] experiments at 7, 8, and 13 TeV center-of-mass energies, assuming a resolved final state where the decay products of the tt system can be reconstructed individually. Resolved top quark reconstruction is possible for top quark transverse momenta (p T ) up to about 500 GeV. At higher p T , the top quark decay products are highly collimated ("Lorentz boosted"), and they can no longer be reconstructed separately. To explore the highly boosted phase space, top quark decays are reconstructed as large-radius (R) jets in this analysis. Previous efforts in this domain by ATLAS [20,21] and CMS [22] confirm that it is feasible to perform precise differential measurements of high-p T tt production and have also indicated possibly interesting deviations from theory.
This paper reports a measurement of the differential tt production cross section in the boosted regime in the all-jet and lepton þ jets final states. The results are based on pp collisions at ffiffi ffi s p ¼ 13 TeV recorded by the CMS detector, corresponding to a total integrated luminosity of 35.9 fb −1 .
In the all-jet decay channel, each W boson arising from the t → Wb transition decays into a quark (q) and antiquark (q 0 ). As a result, the final state consists of at least six quarks, two of which are bottom quarks. Additional partons, gluons or quarks, can arise from initial-state radiation (ISR) and final-state radiation (FSR). The sizable boost of the top quarks in this measurement (p T > 400 GeV) provides two top quarks reconstructed as large-R jets, and the final state therefore consists of at least two such jets. In the lepton þ jets channel, one top quark decays according to t → Wb → qq 0 b and is reconstructed as a single large-R jet, while the second top quark decays to a bottom quark and a W boson that in turn decays to a charged lepton (l), either an electron (e) or a muon (μ), and a neutrino (t → Wb → lνb). Decays of W bosons via τ leptons to electrons or muons are treated as signal. The measurements were performed using larger integrated luminosity and higher center-of-mass energy compared to previous CMS results [22]. This provides a sharper confrontation with theory over data in a wider region of phase space. The paper is organized as follows. Section II describes the main features of the CMS detector and the triggering system. Section III gives the details of the Monte Carlo (MC) simulations. Event reconstruction and selection are outlined in Secs. IV and V, respectively. In Sec. VI, we discuss the estimation of the background contributions, followed by a description of signal extraction in Sec. VII. Systematic uncertainties are discussed in Sec. VIII. The unfolding procedure used to obtain the particle-and partonlevel cross sections and the resulting measurements are presented in Sec. IX. Finally, Sec. X provides a brief summary of the paper.

II. CMS DETECTOR
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. A silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two end cap sections, reside within the magnetic volume. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and end cap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system and kinematic variables, can be found in Ref. [23].
Events of interest are selected using a two-tiered trigger system [24]. The first level (L1), composed of specialized hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of about 100 kHz within a fixed time interval of 4 μs. The second level, known as the high-level trigger (HLT), consists of a farm of processors that run the full event reconstruction software in a configuration for fast processing and reduces the event rate to about 1 kHz before data storage.

III. EVENT SIMULATION
We use MC simulation to generate event samples for the tt signal and also to model the contributions from some of the background processes. The tt events are generated at nextto-leading order (NLO) in QCD using POWHEG (version 2) [25][26][27][28][29], assuming a top quark mass m t ¼ 172.5 GeV. Single top quark production in the t channel and in association with a W boson is simulated at NLO with POWHEG [30], while s channel production is negligible in this analysis. The production of W and Z bosons in association with jets (V þ jets), as well as multijet events, is simulated using the MadGraph5_aMC@NLO [31] (version 2.2.2) generator at leading order (LO), with the MLM matching algorithm [32] to avoid double-counting of partons. Samples of diboson (WW, WZ, or ZZ) events are simulated at LO using PYTHIA (version 8.212) [33,34].
All simulated events are processed using PYTHIA to model parton showering, hadronization, and the underlying event (UE). The NNPDF3.0 [35] parton distribution functions (PDFs) are used to generate the events, and the CUETP8M1 UE tune [36] is used for all but the tt and single top quark processes. For these, the CUETP8M2T4 tune with an adjusted value of the strong coupling α S is used, yielding an improved modeling of tt event properties [37]. The simulation of the response of the CMS detector is based on GEANT4 [38]. Additional pp interactions in the same or neighboring bunch crossings (pileup) are simulated through PYTHIA and overlaid with events generated according to the pileup distribution measured in data. An average of 27 pileup interactions was observed for the collected data.
The simulated processes are normalized to their best known theoretical cross sections. Specifically, the tt, V þ jets, and single top quark event samples are normalized to next-to-NLO precision in QCD [39][40][41].
The measured differential cross sections for tt production are compared with state-of-the-art theoretical expectations provided by the NLO POWHEG generator, combined with PYTHIA for parton showering, as described above, or combined with NLO HERWIG++ [42] and the corresponding EE5C UE tune [43]. In addition, a comparison is performed with MadGraph5_aMC@NLO [31] using PYTHIA for the parton showering.

IV. EVENT RECONSTRUCTION
Global event reconstruction, also called particle-flow (PF) event reconstruction [44], aims to reconstruct and identify each individual particle in an event through an optimized combination of information from all subdetectors. In this process, the particle type (photon, electron, muon, and charged or neutral hadron) plays an important role in the determination of particle direction and energy. Photons are identified as ECAL energy clusters not linked to the extrapolation of any charged-particle trajectory to the ECAL. Electrons are identified as primary charged particle tracks and potentially multiple ECAL energy clusters corresponding to extrapolation of these tracks to the ECAL and to possible bremsstrahlung photons emitted along the way through the tracker material. Muons are identified as tracks in the central tracker consistent with either a track or several hits in the muon system associated with calorimeter deposition compatible with the muon hypothesis. Charged hadrons are identified as chargedparticle tracks that are identified as neither electrons nor as muons. Finally, neutral hadrons are identified as HCAL energy clusters not linked to any charged-hadron trajectory or as a combined ECAL and HCAL energy excess relative to the expected deposit of the charged-hadron energy.
The energy of photons is obtained from the ECAL measurement. The energy of electrons is determined from a combination of the track momentum at the main interaction vertex, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The momentum of muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
Leptons and charged hadrons are required to be compatible with originating from the primary interaction vertex. The candidate vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. For this purpose, the physics objects are the jets, clustered using the jet finding algorithm [45,46] with the tracks assigned to candidate vertices as inputs, and the negative vector p T sum of those jets. Charged hadrons that are associated with a pileup vertex are classified as pileup candidates and are ignored in the subsequent event reconstruction. Electron and muon objects are first identified from corresponding electron or muon PF candidates. Next, jet clustering is performed on all PF candidates that are not classified as pileup candidates. The jet clustering does not exclude the electron and muon PF candidates, even if these have already been assigned to electron/muon objects. A dedicated removal of overlapping physics objects is therefore used at the analysis level to avoid double counting.
Electrons and muons selected in the l þ jets channel must have p T > 50 GeV and jηj < 2.1. For vetoing leptons in the all-jet channel, they are instead required to have p T > 20 GeV and jηj < 2.1. Leptons are also required to be isolated according to the "mini-isolation" (I mini ) algorithm, which requires the scalar p T sum of tracks in a cone around the electron or muon to be less than a given fraction of the lepton p T (p l T ) [47]. The width of the cone (ΔR) depends on the lepton p T , being defined as ΔR ¼ ð10 GeVÞ=p l T for p l T < 200 GeV and ΔR ¼ 0.05 for p l T > 200 GeV. This algorithm retains high isolation efficiency for leptons originating from decays of highly boosted top quarks. A value of I mini < 0.1 is chosen, corresponding to approximately a 95% efficiency. For vetoing additional leptons in the l þ jets channel, the same lepton selection is used with the isolation requirement removed. Correction factors are applied to account for differences between data and simulation in the modeling of lepton identification, isolation, and trigger efficiencies, determined as functions of jηj and p T of the electron or muon using a "tag-and-probe" method [48].
In each event, jets are clustered using the reconstructed PF candidates through the infrared-and collinear-safe anti-k T algorithm [45,46]. Two jet collections are considered to identify b and t jet candidates. Small-R jets are clustered using a distance parameter of 0.4 in the l þ jets channel and large-R jets using a distance parameter of 0.8 in the all-jet and l þ jets channels. The jet momenta are determined through the vector sum of all particle momenta in the jet and found from simulation to be typically within 5%-10% of the true momentum over the entire spectrum and detector acceptance. Additional pp interactions can contribute more tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, the pileup candidates are discarded before the clustering, and an offset correction is applied to correct for the remaining contributions from neutral particles [49].
Jet energy corrections are obtained from simulation to bring the average measured response of jets to that of particlelevel jets. In situ measurements of the momentum balance in dijet, photon þ jet, Z þ jet, and multijet events are used to account for any residual differences in the jet energy scale (JES) between data and simulation [50]. The jet energy resolution (JER) amounts typically to 15%-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV. Additional criteria are applied to remove jets that are due to anomalous signals in the subdetectors or due to reconstruction failures [51].
A grooming technique is used to remove soft, wide-angle radiation from the large-R jets and to thereby improve the mass resolution. The algorithm employed is the "modified mass drop tagger" [52,53], also known as the "soft-drop" (SD) algorithm [54], with angular exponent β ¼ 0, soft cutoff threshold z cut < 0.1, and characteristic radius R 0 ¼ 0.8 [54]. The corresponding SD jet mass is referred to as m SD . The subjets within large-R jets are identified through a reclustering of their constituents using the Cambridge-Aachen algorithm [55,56] and then reversing the last step of the clustering history.
To identify jets originating from top quarks that decay according to t → Wb → qq 0 b (t tagging), we use the Nsubjettiness variables [57] τ 3 , τ 2 , and τ 1 computed using the jet constituents according to where N denotes the number of reconstructed candidate subjets and k runs over the constituent particles in the jet [58]. The term min refers to the minimum value of the items within the curly brackets, and the variable ΔR i;k ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Small-R jets and subjets of large-R jets are identified as bottom quark candidates (b -tagged) using the combined secondary vertex (CSV) algorithm [61]. Data-to-simulation correction factors are used to match the b tagging efficiency observed in simulation to that measured in data. The typical efficiencies of the b tagging algorithm for small-R jets and subjets of large-R jets are, respectively, 63% and 58% for genuine b (sub)jets, while the misidentification probability for light-flavor (sub)jets is 1%. For the subjets of large-R jets, the efficiency for tagging genuine b subjets drops from 65% to 40% as the p T increases from 20 GeV to 1 TeV.
The missing transverse momentum vector ⃗p miss T is defined as the projection onto the plane perpendicular to the beam axis of the negative momentum vector sum of all PF candidates in an event. Its magnitude is referred to as p miss T , which is calculated after applying the aforementioned jet energy corrections.

A. Trigger
Different triggers were employed to collect signal events in the all-jet and l þ jets channels, according to each event topology. The trigger used in the all-jet channel required the presence of a jet with p T > 180 GeV at L1. At the HLT, large-R jets were reconstructed from PF candidates using the anti-k T algorithm with a distance parameter of 0.8. The mass of the jets at the HLT, after removal of soft particles, was required to be greater than 30 GeV. Selected events had to contain at least two such jets with p T > 280 and 200 GeV for the leading and trailing jets, respectively. Finally, at least one of these jets had to be b tagged using the CSV algorithm suitably adjusted for the HLT at an average identification efficiency of 90% for b jets. The aforementioned trigger ran for the entire 2016 data run, collecting an integrated luminosity of 35.9 fb −1 . A second trigger with identical kinematic criteria but without any b tagging requirement was employed and ran on average every 21 bunch crossings, collecting an integrated luminosity of 1.67 fb −1 . The events collected with the latter trigger were intended for use as a control data sample to estimate the multijet background in the all-jet channel, as described below. For the l þ jets channel, the data were selected using triggers requiring a single lepton without imposing any isolation criteria, either an electron with p T > 45 GeV and jηj < 2.5 or a muon with p T > 40 GeV and jηj < 2.1, as well as two small-R jets with p T > 200 and 50 GeV.

B. All-jet channel
The events considered in the all-jet final state are required to fulfill a common baseline selection. This requires the presence of at least two large-R jets in the event with p T > 400 GeV, jηj < 2.4, and 50 < m SD < 300 GeV. In addition, events with at least one lepton are vetoed to suppress leptonic final states originating from top quarks.
Jet substructure variables are used to discriminate between events that originate from tt decays and multijet production. These are sensitive to the type of jet and in particular to whether the jet arises from a single parton, such as those in the case of ordinary quark or gluon evolutions into jets, or from three partons, such as in the t → Wb → qq 0 b decay considered here. The τ 1;2;3 variables of the two large-R jets with highest p T are combined through a neural network (NN) to form a multivariate discriminant that characterizes each event, with values close to zero indicating dijet production and values close to one favoring tt production. These variables are chosen such that the correlation with the number of b-tagged subjets, which is used to define control regions for the multijet background, is minimal. The NN consists of two hidden layers with 16 and 4 nodes, implemented in the TMVA toolkit [62]. More complex architectures do not improve the discriminating capabilities of the NN. The training of the NN is performed with simulated multijet (background) and tt (signal) events that satisfy the baseline selection, through the back-propagation method and a sigmoid activation function for the nodes. Excellent agreement between data and simulation is observed for the input variables in the phase space of the training.
Besides the baseline selection, subregions are defined based on the NN output, the m SD of the jets, and the number of b-tagged subjets in each large-R jet. The signal region (SR) used to extract the differential measurements contains events collected with the signal trigger where both large-R jets contain a b-tagged subjet, have masses in the range of 120-220 GeV, and have NN output values greater than 0.8. This value is chosen to ensure that the ratio of tt signal to background is large, while keeping a sufficient number of signal events with a top quark p T > 1 TeV. In this region, more than 95% of the selected tt events originate from alljet top quark decays according to simulation. The multijet control region (CR) contains events collected via a control trigger that satisfy the same requirements as those in the SR, but with an inverted b tagging requirement. In addition, expanded regions that include both SR and CR events are defined to estimate background contributions. Signal region A (SR A ) and control region A (CR A ) are the same as the SR and CR but have an extended requirement on the m SD of large-R jets of 50-300 GeV. It should be noted that the events selected in SR A and CR A were collected with the signal and control triggers, respectively. Finally, signal region B (SR B ) has the same selection criteria as the SR, except without an NN requirement, and is used to constrain some of the signal modeling uncertainties.

C. l + jets channel
The l þ jets final state is identified through the presence of an electron or a muon, a small-R jet that reflects the bottom quark emitted in the t → Wb → lνb decay, and a large-R jet corresponding to the top quark decaying according to t → Wb → qq 0 b. Small-R (large-R) jets are required to have p T > 50ð400Þ GeV and jηj < 2.4.
All events are required to pass the following preselection criteria, to contain: (i) exactly one electron or muon; (ii) no additional veto leptons; (iii) at least one small-R jet near the lepton, with 0.3 < ΔRðl; jetÞ < π=2; (iv) at least one large-R jet away from the lepton, with ΔRðl; jetÞ > π=2; (v) p miss T > 50 or 35 GeV for the electron or muon channel, and; (vi) for events in the electron channel, a cutoff to ensure that ⃗p miss T does not point along the transverse direction of the electron or the leading jet, jΔϕð ⃗p X T ; ⃗p miss T Þj < 1.5p miss T =110 GeV, where X stands for the electron or the leading small-R jet. The more stringent p miss T selection and criterion (vi) in the electron channel are applied to further reduce background from multijet production.
Events that fulfill the preselection criteria are categorized according to whether the jet candidates pass or fail the relevant b or t tagging criteria. The b jet candidate is the highest-p T leptonic-side jet in the event, while the t jet candidate is the highest-p T jet on the nonleptonic side. The N-subjettiness ratio τ 3 =τ 2 (abbreviated as τ 32 ) is used to distinguish a three-pronged top quark decay from background processes by requiring τ 32 < 0.81. In addition, the t jet candidate must have 105 < m SD < 220 GeV. A data-tosimulation efficiency correction factor is extracted simultaneously with the integrated signal yield, as described in Sec. VII, to correct the t tagging efficiency in simulation to match that in data.
Events are divided into the following categories: (i) No t tags (0t): the t jet candidate fails the t tagging requirement; (ii) 1 t tag, no b tags (1t0b): the t jet candidate passes the t tagging requirement, but the b jet candidate fails the b tagging requirement; and (iii) 1 t tag, 1 b tag (1t1b): both the t jet candidate and the b jet candidate pass their respective tagging requirements. These event categories are designed to produce different admixtures of signal and background, with the 0t region  having most background and the 1t1b region having the most signal.

VI. BACKGROUND ESTIMATION
The dominant background in the all-jet channel is multijet production, while in the l þ jets channel, the dominant sources of background include nonsignal tt, single top quark, W þ jets, and multijet production events. Nonsignal tt events, referred to as "tt other," comprise dilepton (where one lepton is not identified) and all-jet final states (where a lepton arises from one of the jets), in addition to τ þ jets events where the τ lepton decays hadronically.
In the all-jet channel, the background from multijet production is significantly suppressed through a combination of b tagging requirements for the subjets within the large-R jets and the event NN output, and it is estimated from a control data sample. The two items determined from data are the shape of the multijet background as a function of an observable of interest x and the absolute normalization N multijet . The shape is taken from CR A , where the tt signal contamination, based on simulation, is about 1%. The value of N multijet is extracted through a binned maximum likelihood fit of the data in SR A of the m SD of the t jet candidate, m t , where the t jet candidate is taken as the large-R jet with highest p T . The expected number of events Dðm t Þ is modeled according to which contains the distributions Tðm t Þ and Bðm t Þ of the signal and the subdominant backgrounds, respectively, taken from MC simulation, and the distribution Qðm t Þ Events / 50 GeV   of the multijet background. To account for a possible difference in the multijet m t dependence in the CR A and SR A , a multiplicative factor ð1 þ k slope m t Þ is introduced, inspired by the simulation, but with the slope parameter k slope left free in the fit. Also free in the fit are the normalization factors N tt , N multijet , and N bkg . Two additional nuisance parameters are introduced in the analytic parametrization of the m t distribution for simulated tt events, k scale and k res , which account for possible differences between data and simulation in the scale and resolution in the m t parameter. The fit is performed using the ROOFIT toolkit [63], and the results are shown in Fig. 1 and Table I. The fitted tt yield of 6238 AE 181 is significantly lower than the 9885 events expected in the SR A according to tt simulation and the theoretical cross section discussed in Sec. III, which implies that the fiducial cross section is smaller than the POWHEG+PYTHIA8 prediction, and corresponds to a fitted signal strength r ¼ 0.64 AE 0.03. This result is consistent with the softer top quark p T spectrum compared to NLO predictions that has been reported in previous measurements [10,13]. The fitted signal strength is used to scale down the expected tt signal yields from the POWHEG+PYTHIA8 simulation in various SRs in the subsequent figures containing comparisons between data and simulations but not in the subsequent derivation of the differential cross sections. The nuisance parameters that control the scale and the resolution of the reconstructed mass are consistent with unity, confirming thereby the good agreement between data and simulation in this variable.  The subdominant background processes, namely single top quark production and vector bosons produced in association with jets, have a negligible contribution in the SR (less than 1% in the entire phase space) and are fixed to the predictions from simulation. Figure 2 shows the distribution in the NN output in the SR B , and Figs. 3 and 4 show the p T and absolute rapidity jyj of the two top quark candidates and the mass, p T , and rapidity y of the tt system, respectively. Also, the m SD values of the two jets are shown in Fig. 5. The tt and multijet processes are normalized according to the results of the fit in SR A described above, while the yields in subdominant backgrounds are taken from simulation. Table II summarizes the event yields in the SR.
In the l þ jets channel, background events from tt other, single top quark, V þ jets, and diboson production are estimated from simulation. The multijet background is modeled using a data sideband region defined by inverting the isolation requirement on the lepton and relaxing the lepton identification criteria. The predicted contributions from signal and other background events are subtracted from the data distribution in the sideband region to obtain the kinematic distributions for multijet events. The normalization of the multijet background is extracted from a maximum likelihood fit, discussed in Sec. VII B; an initial estimate of its normalization is taken as the simulated prediction. The normalizations of the other background processes are also constrained via the fit.

A. All-jet channel
In the all-jet channel, the tt signal is extracted from data by subtracting the contribution from the background. The signal is extracted as a function of seven separate variables, p T and jyj of the leading and subleading t jet, as well as the mass, p T , and y of the tt system, according to where x corresponds to one of the variables p t i T , jy t i j, m tt , p tt T , or y tt ; SðxÞ is the tt signal distribution; DðxÞ is the measured distribution in data; QðxÞ is the multijet distribution; and BðxÞ is the contribution from the subdominant backgrounds (for which both the distribution and the normalization are taken from simulation). These distributions refer to the SR. The variable N multijet is the fitted number of multijet events in the SR A . The factor R yield is used to extract the number of multijet events in the SR from N multijet , and it is found (in simulation) to be independent of the b tagging requirement. This allows its estimate from the The uncertainty in R yield includes the statistical uncertainty of the data and the systematic uncertainty of the method as obtained with simulated events.

B. l + jets channel
In the l þ jets channel, the tt signal strength, the scale factor for the t tagging efficiency, and the background normalizations are extracted through a simultaneous binned maximum-likelihood fit to the data across the different analysis categories. The 0t, 1t0b, and 1t1b categories are fitted simultaneously, normalizing each background component to the same cross section in all categories. The resulting fit is expressed in terms of a multiplicative factor, the signal strength r, applied to the input tt cross section. Different variables are used to discriminate the tt signal from the background processes. The small-R jet η distribution is used in the 0t and 1t0b categories, while the large-R jet m SD distribution is used in the 1t1b region. These distributions were chosen as they provide good discrimination between tt, W þ jets, and multijet production, as tt events tend to be produced more centrally than the background, and the m SD distribution peaks near the top quark mass. The tt signal and tt background contributions merge into a single distribution in the fit, essentially constraining the leptonic branching fraction to equal that provided in the simulation.
Background normalizations and experimental sources of systematic uncertainty are treated as nuisance parameters in the fit. The uncertainties from the pileup reweighting, lepton scale factors, JES, JER, and b and t tagging efficiencies are treated as uncertainties in the input distributions. Two separate nuisance parameters are used to describe the t tagging uncertainty: one for the t tagging scale factor applied to the tt and single top quark (tW) events, where we expect the t-tagged jet to correspond to a genuine top quark, while the t misidentification scale factor is applied to the remaining background. The uncertainties in the integrated luminosity and background normalizations are treated as uncertainties in the production cross sections of the backgrounds. The event categories in the fit are designed such that the t tagging efficiency is constrained by the relative population of events in the three categories. The different admixtures of the signal and background events between the categories provide constraints on the background normalizations. The measurement of the signal strength is correlated with various nuisance parameters, with the strongest correlation being with the t tagging efficiency, as expected. To determine the uncertainties in distributions, the nuisance parameter is used to interpolate between the nominal distribution and distributions corresponding to AE1 standard deviation changes in the given  uncertainty. The uncertainties from theoretical modeling are evaluated independently from the fit. The fit is performed by minimizing a joint binned likelihood constructed from the kinematic distributions in the e þ jets and μ þ jets channels, with most nuisance parameters constrained to be identical in both channels. The nuisance parameters associated with the electron and muon scale factors are treated separately, as are the normalizations of the multijet background in the electron and muon channels. The event yields that account for shifts in all nuisance parameters are given in Table III. The posterior kinematic distributions for the three event categories are shown in Fig. 6. Figure 7 shows the p T and y distributions for the t jet candidate in each of the three event categories for the combined l þ jets channel. All distributions use the posterior t tagging scale factors and background normalizations, but not the posterior values of other nuisance parameters. The posterior t tagging efficiency and misidentification scale factors are 1.04 AE 0.06 and 0.79 AE 0.06, with an additional p T -and η-dependent uncertainty in the ranges of 1%-8% and 1%-13%. The fitted background normalizations are generally in good agreement with their corresponding prefit values.
The posterior signal strength determined in the fit is 0.81 AE 0.05; i.e., the tt simulation is observed to overestimate the data by roughly 25% in the region of the fiducial phase space. The measured signal strength extrapolated from the fit serves as an indicator of the level of agreement between the measured integrated tt cross section and the prediction from simulation.

VIII. SYSTEMATIC UNCERTAINTIES
The systematic uncertainties originate from both experimental and theoretical sources. The former include all those related to differences in performance in particle reconstruction and identification between data and simulation, as well as in the modeling of background. The latter are related to the MC simulation of the tt signal process and affect, primarily, the unfolded results through the acceptance, efficiency, and migration matrices. Each systematic variation produces a change in the measured differential cross section and that difference, relative to the nominal result, defines the effect of this variation on the measurement.
The dominant experimental sources of the systematic uncertainty in the all-jet channel are the JES and the subjet b tagging efficiency. In the l þ jets channel, the efficiencies in t and b tagging provide the largest contributions to the uncertainties. The different sources are described below: (i) Multijet background (all jet).-The fitted multijet yield as well as the uncertainty in R yield in Eq. (3) impact the distribution of the signal events as a function of each variable of interest. These are estimated to be about 1% from a comparison of the distribution in each variable of the SR with its CR (as described in Sec. V) in simulated events, as well as for different pileup profiles in data collected with the control trigger relative to the signal trigger. The uncertainty in R yield is dominated by the assumption of the extraction method (estimated through simulated events), while the statistical contribution is smaller. (ii) Subdominant backgrounds (all jet).-The expected yield from the subdominant backgrounds estimated from simulation (single top quark production and vector bosons produced in association with jets) is changed by AE50%, leading to a negligible uncertainty (less than 1%). (iii) Background estimate (l þ jets).-An a priori uncertainty of 30% is applied to the single top quark and W þ jets background normalizations, to cover a possible mismodeling of these background sources in the region of phase space probed in the analysis. An additional uncertainty in flavor composition of the W þ jets process is estimated by changing the light-and heavy-flavor components independently by their 30% normalization uncertainties. For the multijet normalization, an a priori uncertainty of 50% is used to reflect the combined uncertainty in the normalization and the extraction of the kinematic contributions from the sideband region in data. These background sources and the corresponding systematic uncertainties are all constrained in the maximum likelihood fit. (iv) JES.-The uncertainty in the energy scale of each reconstructed large-R jet is a leading experimental contribution in the all-jet channel. It is divided into 24 independent sources [50], and each change is used to provide a new jet collection that affects the repeated event interpretation. This results not only in changes in the p T scale but can also lead to different t jet candidates. The p T -and η-dependent JES uncertainty is about 1%-2% per jet. The resulting uncertainty in the measured cross section is typically about 10% but can be much larger at high top quark p T . For the l þ jets channel, the uncertainty in JES is estimated for both small-R and large-R jets by shifting the jet energy in simulation up or down by their p T -and η-dependent uncertainties, with a resulting impact on the differential cross section of 1%-10%. (v) JER.-The impact on the JER is determined by smearing the jets according to the JER uncertainty [50]. The effect on the cross section is relatively small, at the level of 2%. (vi) t tagging efficiency (l þ jets).-The t tagging efficiency and its associated uncertainty are extracted simultaneously with the signal strength and background normalizations in the likelihood fit of the l þ jets analysis, discussed in Sec. VII. The uncertainty in the t tagging efficiency is in the range 6%-10%, while for the misidentification rate, it is 8%-15%, depending on the p T and η of the t jet. (vii) Subjet b tagging efficiency (all jet).-The uncertainty in the identification of b subjets within the large-R jets (estimated in Ref. [61]) is the leading experimental uncertainty in the all-jet channel. The effect on the cross sections is about 10%, relatively independent of the observables. Unlike the uncertainty associated with JES, the b-subjet tagging uncertainty largely cancels in the normalized cross sections. (viii) b tagging efficiency (l þ jets).-For the l þ jets channel, the small-R jet b tagging efficiency in the simulation is corrected to match that measured in data using p T -and η-dependent scale factors [61]. The resulting uncertainty in the differential cross sections is about 1%-2%. The b tagging efficiency and non-b jet misidentification uncertainties are treated as fully correlated. (ix) Pileup.-The uncertainty related to the pileup modeling is subdominant. The impact on the measurement is estimated by changing the total inelastic cross section used to reweight the simulated events by AE4.6% [64]. The effect on the cross sections is negligible (less than 1%). (x) Trigger (all jet).-The uncertainty associated with the trigger, accounting for the difference between the simulated and observed trigger efficiency, is well below 1% in the phase space of the all-jet channel. The measurement of the trigger efficiency is performed in events collected with an orthogonal trigger that requires the presence of an isolated muon with p T greater than 27 GeV. (xi) Lepton identification and trigger (l þ jets).-The performance of the lepton identification, reconstruction, trigger, and isolation constitutes a small source of systematic uncertainty. Correction factors used to modify the simulation to match the efficiencies observed in data are estimated through a tag-and-probe method using Z → ll decays. The corresponding uncertainty is determined by changing the correction factors up or down by their uncertainties. The resulting systematic uncertainties depend on lepton p T and η and are in the range 1%-7% (1%-5%) for electrons (muons). (xii) Integrated luminosity.-The uncertainty in the measurement of the integrated luminosity is 2.5% [65]. The theoretical uncertainties are divided into two subcategories: sources of systematic uncertainty related to the matrix element calculations of the hard scattering process and sources related to the modeling of the parton shower and the underlying event. The first category (consisting of the first three sources below) is evaluated using variations of the simulated event weights, while the second category is evaluated with dedicated, alternative MC samples with modified parameters. These sources are: (i) Parton distribution functions.-The uncertainty from PDFs is estimated by applying event weights corresponding to the 100 replicas of the NNPDF PDFs [35]. For each observable, we compute its standard deviation from the 100 variants. with α S is estimated by applying event weights corresponding to higher or lower values of α S for the matrix element using the changed NNPDF PDFs [35] values of α S ¼ 0.117 or 0.119, compared to the nominal value 0.118. (iv) ISR and FSR.-The uncertainty in the ISR and FSR is estimated from alternative MC samples with reduced or increased values of α S used in PYTHIA to generate that radiation. The scale in the ISR is changed by factors of 2 and 0.5, and the scale in the FSR is changed by factors of ffiffi ffi 2 p and 1= ffiffi ffi 2 p [66]. In the all-jet channel, the FSR uncertainty is constrained by a fit to the data in SR B , using the NN output that is sensitive to the modeling of FSR. This leads to a reduced uncertainty that is 0.3 times the variations from the alternative MC samples.
(v) Matching of the matrix element to the parton shower.-In the POWHEG matching of the matrix element to the parton shower (ME-PS), the resummed gluon damping factor h damp is used to regulate high-p T radiation. The nominal value is h damp ¼ 1.58m t . Uncertainties in h damp are parametrized by considering alternative simulated samples with h damp ¼ m t and h damp ¼ 2.24m t [37]. (vi) Underlying event tune.-This uncertainty is estimated from alternative MC samples using the CUETP8M2T4 parameters varied by AE1 standard deviation [37].

IX. CROSS SECTION MEASUREMENTS
Here, we discuss the differential tt production cross sections measured in the all-jet and l þ jets channels as a function of different kinematic variables of the top quark or tt system, corrected to the particle and parton levels using an unfolding procedure. The measurements are compared to predictions from different MC event generators.

A. Definition of particle and parton levels
The parton-level phase space to which the measurement is unfolded is constrained by the kinematic requirements of the detector-level fiducial region. Namely, in the all-jet decay channel, the t and t must have p T > 400 GeV and jηj < 2.4. In addition, m tt > 800 GeV is required to avoid extreme events with large top quark p T and small m tt .
The parton-level definition for the l þ jets channel differs in that it is defined for l þ jets events, where one top quark decays according to t → Wb → qq 0 b and has p T > 400 GeV to match the fiducial requirement at the detector level and the other top quark decays as t → Wb → lνb without any p T requirement. The so-called particle level represents the state of quasistable particles with a mean lifetime greater than 30 ps originating from the pp collision after hadronization but before the interaction of these particles in the detector. The observables computed from the momenta of particles are typically better defined than those computed from parton-level information. Also, the associated phase space is closer to the fiducial phase space of the measurement at the detector level, which provides smaller theoretical uncertainties. In the context of this analysis, particle jets are reconstructed from quasistable particles, excluding neutrinos, using the anti-k T algorithm with a distance parameter of 0.8-identical to reconstruction at detector level-and just the particles originating from the primary interaction. Subsequently, jets that are geometrically matched to generated leptons within ΔR < 0.4 in η-ϕ (i.e., from the leptonic decays of W bosons) are removed from the particle jet collection. For the all-jet channel, the two particle jets with highest p T are considered the particle-level t jet candidates. To match the fiducial phase space as closely as possible, the same kinematic selection criteria are applied as for the detector-level events. In particular, the particle-level jets must have p T > 400 GeV and jηj < 2.4, while the mass of each jet must be in the 120-220 GeV range, and the invariant mass of the two jets must be greater than 800 GeV. The matching efficiency between the particlelevel t jet candidates and the original top quarks at the parton level lies between 96% and 98%.
The particle-level phase space for the l þ jets channel is set up to mimic the kinematic selections at the detector level. Particle-level large-R jets are selected if they fulfill p T > 400 GeV, jηj < 2.4, and the jet mass is in the range 105-220 GeV and are then referred to as particle-level t jets. Particle-level small-R jets are selected if they have p T > 50 GeV, jηj < 2.4, and are flagged as b jets (contain a b hadron); these are referred to as particle-level b jets. Particle-level electrons and muons are selected if they have p T > 50 GeV and jηj < 2.1. To fulfill the particle-level selection criteria, an event must contain at least one t jet, at least one b jet, and at least one electron or muon, all at the particle level.
To quantify the overlap in the definitions of detector-, particle-, and parton-level phase space, we define two fractions f 1;2 , where f 1 is the fraction of reconstructed events that pass the selection at the unfolded level (parton or particle) in the same observable range and f 2 is the fraction of generated events at the unfolded level that are selected at the reconstruction level. Figure 8 presents these fractions at the parton and particle levels for the all-jet channel, as a function of the leading top quark p T and jyj. The fraction f 1 is a function of the leading reconstructed top quark, and the f 2 is a function of the leading top quark at parton or particle level. The distribution of f 1 vs p T shows a characteristic threshold behavior due to the resolution in p T , while f 1 is independent of jyj. The f 2 value decreases with p T , primarily due to the inefficiency of subjet b tagging and the NN output dependence on the p T (at high jet p T , it is more difficult to differentiate between ordinary jets and highly boosted top quarks). Also, f 2 decreases at high jyj values due to the increased inefficiency in b tagging at the edges of the CMS tracker.

B. Unfolding
We extract the differential cross sections by applying an unfolding procedure, which is necessary due to the finite resolution of the detector. The unfolded cross sections are evaluated as follows, where L is the total integrated luminosity and Δx i is the width of the ith bin of the observable x. The quantity R −1 ij is the inverse of the migration matrix between the ith and jth bins, and S j is the signal yield in the jth bin computed from Eq. (3). The binning of the various observables is chosen such that the purity (fraction of reconstructed events for which the true value of the observable lies in the same bin) and the stability (fraction of true events where the reconstructed observable lies in the same bin) are well above 50% for most of the bins. This choice results in migration matrices with suppressed nondiagonal elements, shown for the all-jet channel in Fig. 9 and for the l þ jets channel in Fig. 10. To minimize biases introduced by the various unfolding methods utilizing regularization, we use migration-matrix inversion, as written in Eq. (4) and implemented in the TUnfold framework [67], for the price of a moderate increase in statistical uncertainty compared to unfolding methods utilizing regularization. For the all-jet channel, the fractions f 1 and f 2 in Eq. (4) are determined independently from the unfolding, as described in Sec. IX A and shown in Fig. 8. For the l þ jets channel, both the reconstruction efficiencies and bin migrations are accounted for directly via TUnfold.

C. All-jet channel
For the all-jet channel, the measurement of the unfolded differential cross section in bin j of the variable x is performed using Eq. (4). To estimate the uncertainty in the measurement, the entire procedure of the signal extraction, unfolding with different response matrices, and extrapolation to the particle-or parton-level phase space is repeated for every source of uncertainty discussed in Sec. VIII. The unfolded cross sections at the particle (parton) level are shown in . show a summary of the statistical and the dominant systematic uncertainties in the differential cross section, as a function of the leading top quark p T and jyj at the particle and parton levels, respectively.

D. l + jets channel
In the l þ jets channel, the differential tt cross section is measured as a function of the p T and jyj of the top quark that decays according to t → Wb → qq 0 b. The measurement at the particle level defines a region of phase space that mimics the event selection criteria as detailed in Sec. IX A but at the parton level corresponds to the phase space where the nonleptonically decaying top quark has p T > 400 GeV. The l þ jets tt events are selected at the parton level, and the properties of the nonleptonically decaying top quarks are defined to represent the true top quark p T values.
The differential cross section is extracted from the signal-dominated 1t1b category. The distribution in the measured signal is determined by subtracting the estimated background contributions from the distribution in data, using the posterior normalizations from the fit given in Table III. To account for reconstruction efficiencies and bin migrations in signal, we use unregularized unfolding as described in Sec. IX B. The unfolding relies on response matrices that map the p T and jyj distributions for the t-tagged jet to corresponding properties for either the particle-level t jet candidate or the parton-level top quark.
Systematic uncertainties in the unfolded measurement receive contributions from the experimental and theoretical sources discussed in Sec. VIII. The posterior values from the likelihood fit are used for the t tagging efficiency, background normalizations, and lepton efficiencies, while the a priori values are used for the remaining uncertainties. For each systematic change that affects the distribution in p T or jyj, we define a separate response matrix that is used to unfold the data. The resulting uncertainties are added in quadrature to obtain the total uncertainty in the unfolded distribution.
The data in the electron and muon channels are combined before the unfolding by adding the measured distributions and their response matrices into a single channel. The background contributions are also merged into a single channel before subtracting these from the measured distributions, with the exception of the electron and muon multijet backgrounds that are treated as separate sources.
The unfolded cross sections for top quarks are shown in Figs. 19 and 20 as a function of p T and jyj for the particle and parton levels, respectively, and compared to results from POWHEG interfaced with PYTHIA or HERWIG++ and from MadGraph5_aMC@NLO interfaced with PYTHIA.
The breakdown of sources of systematic uncertainty are given in Figs. 21 and 22. The cross section at the parton level as a function of the p T of the top quark that decays as t → Wb → qq 0 b presented in this paper can also be compared to the corresponding measurement from CMS in the resolved final state [19]. The two measurements are observed to be in agreement in the region of phase space where they overlap.

E. Discussion
The unfolded cross sections at the particle and parton levels reveal some important features. Theory predictions of the integrated cross sections, obtained using POWHEG normalized as described in Sec. III, are 56% and 25% higher than our measurement for the all-jet and l þ jets channels, respectively, which agrees with previous results  [20]. It should be noted that the two channels probe different phase spaces of the tt production, due to the kinematic requirement on the subleading top quark in the all-jet channel, and therefore the integrated cross sections are not expected to be the same. That is, the phase space probed in the all-jet channel requires two top quarks with p T above 400 GeV, while the l þ jets channel phase space only requires one such high-p T top quark. In terms of the normalized differential distributions, there is agreement between the data and theory within the uncertainties of the measurement and some qualitative observations can be made by comparing the central values of the data and theory. There is good agreement for the leading top quark (all-jet channel) and the p T of the top quark that decays as t → Wb → qq 0 b (l þ jets channel), while the cross section as a function of the p T of the subleading top quark in the all-jet channel appears to be softer in data than for the POWHEG predictions, with MadGraph5_aMC@NLO providing the best description. The distributions in y are well described by theory in both channels, with a small deviation for the subleading top quark that is related to the difference in the p T spectrum. Finally, the measured distributions for the tt system are mostly in agreement with theory, with a possible deviation in the m tt variable, where POWHEG tends to produce a harder spectrum, while MadGraph5_aMC@NLO is fully consistent with the data. Regarding systematic uncertainties, it should be noted that they are in general larger for the all-jet channel because the two leading experimental sources in JES and b tagging enter twice (two large-R jets). In contrast, the uncertainty in parton showering is smaller for the all-jet channel because its main contribution (FSR) is constrained through a dedicated analysis, as discussed in Sec. VIII.     17. Breakdown of the uncertainties in the absolute (left column) and normalized (right column) measurement at the particle level, as a function of the leading top quark p T (upper row) and jyj (lower row) in the all-jet channel. The shaded band shows the statistical uncertainty, while the solid lines show the systematic uncertainties grouped in four categories: a) uncertainty due to pileup and the JES and JER of the large-R jets, b) uncertainty due to flavor tagging of the subjets, c) uncertainty due to the modeling of the parton shower, and d) uncertainty due to the modeling of the hard scattering.  18. Breakdown of the uncertainties in the absolute (left column) and normalized (right column) measurement at the parton level, as a function of the leading top quark p T (upper row) and jyj (lower row) in the all-jet channel. The shaded band shows the statistical uncertainty, while the solid lines show the systematic uncertainties grouped in four categories: a) uncertainty due to pileup and the JES and JER of the large-R jets, b) uncertainty due to flavor tagging of the subjets, c) uncertainty due to the modeling of the parton shower, and d) uncertainty due to the modeling of the hard scattering. . Differential cross section measurements at the particle level, as a function of the particle-level t jet p T (upper row) and jyj (lower row) for the l þ jets channel. Both absolute (left column) and normalized (right column) cross sections are shown. The lower panel shows the ratio ðMC=dataÞ − 1. The vertical bars on the data and in the ratio represent the statistical uncertainty in data, while the shaded band shows the total statistical and systematic uncertainty added in quadrature. The hatched bands show the statistical uncertainty of the MC samples.    21. Breakdown of the sources of systematic uncertainty affecting the differential cross section measurements in the l þ jets channel at the particle level as a function of the particle-level t jet p T (upper row) or jyj (lower row). Both the systematic uncertainties in the absolute (left column) and the normalized (right column) cross sections are shown. "JES þ JER þ b tagging" includes uncertainties due to the JES, JER, and small-R jet b tagging efficiency; "t tagging" is the uncertainty associated with the large-R jet t tagging efficiency; "Other experimental" includes the uncertainties originating from the background estimate, pileup modeling, lepton identification and trigger efficiency, and measurement of the integrated luminosity; "Parton shower" includes contributions from ISR and FSR, underlying event tune, ME-PS matching, and color reconnection; "Hard scattering" includes the uncertainty due to PDFs, as well as renormalization and factorization scales. The gray bands shows the statistical uncertainty.

X. CONCLUSIONS
A measurement was presented of the top quark pair (tt) cross section for top quarks with high transverse momentum (p T ) produced in pp collisions at 13 TeV. The measurement uses events in which either one or both top quarks decay to jets and where the decay products cannot be resolved but are instead clustered in a single large-radius (R) jet with p T > 400 GeV. The all-jet final state contains two such large-R jets, while the lepton þ jets final state is identified through the presence of an electron or muon, a b-tagged jet, missing transverse momentum from the escaping neutrino, and a single t-tagged, large-R jet. The measurement utilizes a larger dataset relative to previous results to explore a wider phase space of tt production and to elucidate any discrepancies with theory that were reported in previous publications. For the all-jet channel, absolute and normalized differential cross sections are measured as functions of the leading and subleading top quark p T and absolute rapidity jyj and as a function of the invariant mass, p T , and y of the tt system, unfolded to the particle level within a fiducial phase space and to the parton level. For the lepton þ jets channel, the differential cross sections are measured as functions of the p T and jyj of the top quark that decays according to t → Wb → qq 0 b, both at the particle and parton levels. The results are compared with theory using the POWHEG matrix element generator, interfaced to either PYTHIA or HERWIG++ for the Breakdown of the sources of systematic uncertainty affecting the differential cross section measurements in the l þ jets channel at the parton level as a function of the top quark p T (upper row) or jyj (lower row). Both the systematic uncertainties in the absolute (left column) and the normalized (right column) cross sections are shown. "JES þ JER þ b tagging" includes uncertainties due to the JES, JER, and small-R jet b tagging efficiency; "t tagging" is the uncertainty associated with the large-R jet t tagging efficiency; "Other experimental" includes the uncertainties originating from the background estimate, pileup modeling, lepton identification and trigger efficiency, and measurement of the integrated luminosity; "Parton shower" includes contributions from ISR and FSR, underlying event tune, ME-PS matching, and color reconnection; "Hard scattering" includes the uncertainty due to PDFs, as well as renormalization and factorization scales.
underlying event and parton showering, and with the MadGraph5_aMC@NLO matrix element generator, interfaced to PYTHIA. All the models significantly exceed the absolute cross section in the phase spaces of the measurements. However, the normalized differential cross sections are consistently well described. The most notable discrepancies are observed in the invariant mass of the tt system and the subleading top quark p T in the all-jet channel, where theory predicts a higher cross section at high mass and at high p T , respectively. To further investigate the severity of this discrepancy, more data are needed to enhance the statistical significance of the measurement in this region of phase space.

ACKNOWLEDGMENTS
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF   research Grants No. 123842, No. 123959, No. 124845, No. 124850, No. 125105, No. 128713, No. 128786, and No. 129058 (Hungary)