Searches for new physics using the ttbar invariant mass distribution in pp collisions at sqrt(s)=8 TeV

Searches for anomalous top quark-antiquark production are presented, based on pp collisions at sqrt(s) = 8 TeV. The data, corresponding to an integrated luminosity of 19.7 inverse femtobarns, were collected with the CMS detector at the LHC. The observed ttbar invariant mass spectrum is found to be compatible with the standard model prediction. Limits on the production cross section times branching fraction probe, for the first time, a region of parameter space for certain models of new physics not yet constrained by precision measurements.


1
With the discovery of a Higgs boson with a mass around 125 GeV [1][2][3], the focus of particle physics has shifted towards understanding the properties of the new boson, uncovering the nature of the underlying electroweak symmetry breaking (EWSB) mechanism, and finding new physics.The standard model (SM) is believed to be an effective theory, i.e., a lowenergy approximation of a more complete theory incorporating gravity and explaining the origin of many parameters that are simply postulated within the SM.Many models beyond the SM (BSM) have been proposed in order to alleviate the hierarchy problem of the SM, which stems from the fact that quantum-loop corrections to the Higgs boson mass diverge quadratically with the highest energy scale of the model, requiring an enormous degree of fine tuning to ensure that the Higgs mass remains close to the W-boson mass up to the Planck scale.Since the largest quantum correction to the Higgs boson mass involves a top-quark loop, it is natural to suppose that these BSM mechanisms would involve interactions with the top quark.
Potential solutions to the hierarchy problem include models with extra spatial dimensions, either flat [4] or warped [5,6].In these models, gravity is allowed to permeate the multidimensional space, which results in its apparent weakness from the point of view of an observer restricted to 3+1 dimensions.This effect lowers the highest energy scale in the SM from the Planck scale to the TeV scale, thus eliminating the hierarchy between the EWSB scale and the highest scale in the theory.Such models often contain Kaluza-Klein excitations of particles, including gravitons and gluons, both of which can have enhanced couplings to tt pairs [7].Other new gauge bosons have been proposed, referred to generically as Z , that also couple preferentially to tt pairs [8][9][10][11][12][13].Furthermore, there may be additional spin-zero resonances that preferentially decay to tt pairs [13,14].These various resonances may be observable as enhancements in the tt invariant mass spectrum.
Discrepancies have been observed in the forward-backward asymmetry of top quark production at the Tevatron [15].Assuming this anomaly is due to new physics above the TeV scale, an enhancement of the tt rate at high invariant mass could be visible at the Large Hadron Collider (LHC) [16,17].
In this Letter, a search for anomalous production of tt events is presented, from data corresponding to an integrated luminosity of 19.7 fb −1 of pp collisions at √ s = 8 TeV, recorded with the Compact Muon Solenoid (CMS) detector [18] at the LHC.These results represent a significant improvement over the previous searches [19][20][21][22][23][24], due primarily to the large increase in the high-x parton luminosity from the higher LHC energy in 2012, but also because of the increased size of the data sample and the combination of several statistically-independent channels.Specific comparisons are made to the resonant production of Randall-Sundrum Kaluza-Klein (RS KK) gluons [7], of a Z boson in the topcolor model [10], and of a scalar Higgs-like boson produced via gluon fusion through its couplings to the top quarks.In addition, enhancements of the tt invariant mass (M tt ) spectrum are constrained for M tt > 1 TeV.These results probe, for the first time, a region of parameter space of models with warped extra dimensions not yet constrained by precision measurements [25].
Since the top quark decays primarily to a W boson and a bottom (b) quark, top pair production signatures are classified based on whether the W bosons decay to leptons or quarks.This measurement combines analyses utilizing the final states where one or both W bosons from tt events decay to quarks ("semi-leptonic" and "all-hadronic" events, respectively).The events are classified into two categories based on the expected kinematics of the top-quark decay products.In the first category, the tt pair is produced near the kinematic threshold, resulting in a topology where each parton is matched to a single jet ("resolved topology").In the second category, each top quark is produced with a high Lorentz boost (>2), resulting in collimated decay products that may be clustered into a single jet ("boosted topology").The transition between the resolved and boosted topologies occurs around M tt = 1 TeV.Both the resolved and boosted topologies are used to analyze the semi-leptonic events.However, all-hadronic events are analyzed only in the boosted topology, which is combined with the semi-leptonic boosted events to perform the search.As the all-hadronic analysis is dominated by multijet events, jet substructure criteria are imposed to further enhance sensitivity.The analysis techniques are similar to those explored in earlier analyses of pp collision data [20,23].
The CMS detector, a general-purpose apparatus operating at the CERN LHC, is described in detail elsewhere [18].The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the superconducting solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass/scintillator hadron calorimeter.Muons are measured in gas-ionization detectors embedded in the steel magnetic flux return yoke outside the solenoid.
At the CMS experiment, the polar angle θ is measured from the beam direction and the azimuthal angle φ is measured perpendicular to the beam direction.The rapidity y is approximated by the pseudorapidity η, defined as η = − ln[tan(θ/2)].The transverse momentum perpendicular to the beamline is denoted as p T .
The CMS experiment uses a particle-flow-based event reconstruction [26], which aggregates information from all subdetectors, including charged-particle tracks from the tracking system and deposited energy from the electromagnetic and hadron calorimeters.Given this information, all detected particles in the event are reconstructed as electrons, muons, photons, neutral hadrons, or charged hadrons.For this paper, electrons and photons are required to have |η| < 2.5, and muons, |η| < 2.1.The leading hard-scattering vertex of the event is defined as the vertex whose tracks have the largest squared-sum of transverse momentum.Charged hadrons associated with other vertices are removed from further consideration.The remaining candidates are clustered into jets using the FASTJET 3.0 software package [27].The semi-leptonic analyses use the anti-k T jet-clustering algorithm [28] with a size parameter of 0.5 (AK5 jets), while the all-hadronic analysis uses the Cambridge-Aachen (CA) jet-clustering algorithm [29,30] with a size parameter of 0.8 (CA8 jets) to take advantage of the capability of the CA algorithm to distinguish jet substructure.Jets are required to satisfy |η| < 2.4.Jets are identified as b-quark jets if they satisfy the combined secondary vertex algorithm defined in Ref. [31].
The data for the semi-leptonic resolved category were collected with triggers requiring a single isolated muon or electron with a p T threshold of 17 or 25 GeV, respectively, in combination with three jets with p T > 30 GeV.Offline, we select events containing exactly one isolated muon (with p µ T > 26 GeV), or electron (with p e T > 30 GeV), and at least four jets with p T > 70, 50, 30, 30 GeV, respectively.The non-W multijet background (NWMJ) is suppressed further by requiring the transverse missing momentum E miss T , the modulus of the vector sum of all measured particle p T , to be larger than 20 GeV.
For the semi-leptonic boosted category, data were recorded with triggers requiring one muon (p µ T > 40 GeV), or one electron (p e T > 35 GeV) in conjunction with two jets (p T > 100, 25 GeV).Since the top-quark decay products can be collinear in this regime, no isolation requirements on the leptons are imposed in either the trigger or offline selections.Offline, we select events containing exactly one muon with p µ T > 45 GeV, or exactly one electron with p e T > 35 GeV and at least two jets with p T > 150, 50 GeV, respectively.To reduce the contamination of NWMJ processes, we follow the techniques of Ref. [23], placing requirements on the angle and relative momentum between the lepton and the nearest jet, and also requiring E miss T > 50 GeV and T > 150 GeV.The events satisfying the two semi-leptonic selections are separated into categories determined by the lepton flavor (electron or muon) and the number of b-tagged jets N b-tag (1 or ≥2 btagged jets for the resolved analysis, and 0 or ≥1 b-tagged jets for the boosted analysis).The purpose of this classification is to separate the sample into regions dominated by different background processes.The events in the categories with fewer b-tagged jets have a higher fraction of W+jets events, whereas those with more b-tagged jets have a higher fraction of tt events.This characterization is used to constrain the various background components by imposing self-consistency among the channels.The reconstruction of semi-leptonic tt candidates relies on a χ 2 variable built by enforcing kinematic consistency (within uncertainties) with the tt hypothesis, imposing constraints on the reconstructed W and top candidates.In the semi-leptonic boosted regime we follow the techniques of Ref. [23], and allow candidates with more than one parton merged into a single jet.
For the boosted all-hadronic analysis, data were recorded with a trigger requiring the scalar sum of the transverse momenta of reconstructed AK5 jets to be greater than 750 GeV.In the offline analysis selection, we require two CA8 jets, each with p T > 400 GeV.
The reconstruction of the boosted all-hadronic analysis relies on "top-tagging" techniques similar to those used in the previous analysis [20].This algorithm [32], aiming to identify the top quark decay products within CA8 jets, reverses the jet-clustering sequence by iteratively separating the jet into subjets until three or four subjets with sufficient p T are found.The algorithm is validated on a sample of tt events selected by requiring one muon and additional jets as shown in Fig. 3 in the Appendix.The reconstructed single-jet top quark mass is found to be consistent with the expectation from simulated events, as is the di-subjet W mass, obtained from the minimum mass pairing of the leading three subjets.The two selected top-tagged jets are then required to be back-to-back, with |∆φ| > π/2 and |∆y| < 1.0, to suppress non-top multijet (NTMJ) backgrounds.SM top-quark production is modeled with the next-to-leading-order (NLO) generator POWHEG (v1.0) [33], interfaced with PYTHIA 6 (v6.2.24) [34] for parton showering with tune Z2 * [35].MADGRAPH (v5.1.1)[36] interfaced to PYTHIA 6 is used for simulating W and Z boson production in association with jets.Diboson processes are generated with PYTHIA 6 to compute both the matrix element and showering.
The MADGRAPH-PYTHIA 6 combination is also used to generate signal Monte Carlo (MC) simulation events for limit setting, including high-mass SM-like Z resonances with Γ Z /M Z = 1% and Γ Z /M Z = 10%, where Γ Z is the width of the resonance, and M Z is the mass.This relative width can be compared to the detector resolution of about 10% for a tt resonance mass.Hence, limits set for the Z with a width of 1% would apply to a larger class of models in which the resonance width is below the experimental resolution.The MADGRAPH-PYTHIA 6 combination is also used to generate a simplified model of a spin-zero resonance produced via gluon fusion through its couplings to top quarks, with the SM interference effects neglected in the model.A Kaluza-Klein excitation of a gluon with a width of approximately 15-20% [7] is generated with PYTHIA 8 (v1.5.3) [37].
The leading-order (LO) CTEQ6L parton distribution functions [38] are used, except for the generation of the POWHEG samples, which use the NLO CT10 parton distribution function [39].All MC samples include additional collisions per beam crossing and are reweighted to match the data taking conditions as well as the identification and trigger efficiencies measured in control samples.
For the semi-leptonic resolved analysis, an aggregate background estimate is taken for all SM components together directly from the data, with the SM tt component the dominant one.The number of signal events is extracted from a binned maximum likelihood fit to the M tt distribution, assuming a smoothly-falling probability density function (pdf) for the SM backgrounds and a parameterization of the signal pdf based on a Breit-Wigner shape.Only events with M tt > 550 GeV are considered; below this value the SM backgrounds are not described by a smoothly-falling pdf.
The M tt distributions of the semi-leptonic and all-hadronic boosted topologies are fitted together in a single joint likelihood maximization, imposing consistency of the various background and signal components across the two channels.The initial estimates for the SM tt, single-top-quark, W+jets, and diboson production are based on simulation after applying data-MC corrections based on control samples in data.The boosted all-hadronic analysis has one additional background component, the NTMJ background, which is estimated using the probability to misidentify a light-parton jet as a top-quark jet measured in data.This probability is derived as a function of the jet p T in a sample enriched in light-quark jets, kinematically similar to the signal region.It is then used to weight events in the signal region.Furthermore, the efficiency for identifying true top-quark jets is corrected in the signal MC simulations using measurements in a signal-depleted sideband region containing events with one isolated muon and additional jets.It is found that the efficiencies in data and MC simulations agree, having a ratio of 93 ± 5%.The methods described above were validated using simulated samples and it was verified that signal contamination was minimal in the signal-depleted regions.
In the likelihood maximization, systematic uncertainties are treated as nuisance parameters.Those that are common among the channels are treated as 100% correlated, while those that are channel-specific are treated as uncorrelated.The normalizations of the backgrounds are allowed to vary within log-normal constraints in the maximization of the joint likelihood.The shapes of the backgrounds are also allowed to vary within their uncertainties.The shapes and normalizations also account for systematic variations due to efficiency and misidentification rates.The constraints used in the joint likelihood maximization are listed in Table 1.
The event yields from the various background components and data are shown in Table 2.The yields of the simulated samples are quoted after the likelihood maximization procedure, and the individual background uncertainties include only the uncertainty in the individual normalization.The total SM contribution includes all uncertainties, including the correlations not quoted in the individual components.Figure 1 shows the M tt distributions for all channels along with the expectation from a Z signal.
In all cases, the data are well-described by the SM-only background hypothesis.The absence of a signal in the M tt distribution is quantified by deriving Bayesian upper limits on the signal cross section times branching fraction at 95% confidence level (CL), using pseudo-experiments.The resolved semi-leptonic analysis has some overlapping phase space with the boosted semileptonic analysis, and there is a transition point (∼1 TeV) where the expected sensitivities of the boosted and resolved analyses are equal, above which the boosted analysis result is quoted, and below which the resolved analysis result is quoted.
Figure 2 shows the expected and observed limits for a narrow resonance, as a function of the invariant mass of the resonance.The specific example shown in Fig. 2 and given by the dashed line refers to a topcolor Z with Γ Z /M Z = 1.2% based on predictions from Ref. [10].The crosssection limits for this case are obtained from the MC models with Γ Z /M Z = 1.0%, scaled by the ratio of theoretical cross sections.This scaling is done to compare to theoretical results and previous measurements.As the cross section calculation is available for this model at LO only,

Sample
Semi-Leptonic Semi-Leptonic All-Hadronic Events / 100 GeV  as for the all-hadronic analysis (c).For the semi-leptonic analyses, "others" refers to all nontop backgrounds, while for the all-hadronic analysis, "NTMJ" refers to the "non-top multijet" background.The shaded band corresponds to the SM background uncertainty.The likelihood fit projection on data for the semi-leptonic resolved analysis is shown in (d).A cross section of 1.0 pb is used for the normalization of the Z samples.the predictions are multiplied by a factor of 1.3 to account for higher-order effects [43].The vertical dash-dotted line indicates the transition between the resolved and boosted analyses.
Table 3 shows additional model-specific limits.The combination of the semi-leptonic and allhadronic boosted analyses improves the expected cross section limits at 2 TeV by ∼25%.Compared to the results of previous analyses [20][21][22][23] for specific models [7,10], the lower limits on the masses of these resonances have been improved by several hundred GeV.For the semileptonic resolved analysis, assuming a spin-zero resonance with narrow width, produced via gluon fusion with no interference with the SM background, the cross section limits are 0.8 pb and 0.3 pb for a spin-zero resonance of mass 500 GeV and 750 GeV, respectively.These are the first limits at CMS for heavy Higgs-like particles decaying into tt.
[TeV]  The 95% CL upper limits on the production cross section times branching fraction as a function of M tt for Z resonances with Γ Z /M Z = 1.2% compared to predictions from Ref. [10] multiplied by 1.3 to account for higher-order effects [43].
In addition to investigating possible resonant structures in the M tt spectrum, the presence of new physics that causes a non-resonant enhancement of the M tt spectrum is also tested.The boosted all-hadronic analysis is used to set limits on such new production for events with M tt > 1 TeV, since the NTMJ background can be predicted entirely from data.The limit is expressed as a ratio of the total SM + BSM tt cross section to the SM-only cross section (S, as defined in Ref. [20]).The efficiency to select SM tt events with M tt > 1 TeV is (3.4 ± 1.7) × 10 −4 .We find S < 1.2 at the 95% CL, with a credible interval of 1.1-2.0 at 68% CL, a factor of two improvement over the previously published limits [20].
In summary, we have performed searches for anomalous tt production using events in the semileptonic and all-hadronic topologies.In addition to new limits on nonresonant enhancements to top-quark production, limits are set on the production cross section times branching fraction for several resonance hypotheses, for resonances in the mass range 0.5-3.0TeV.

A Supplemental Material
We provide additional plots that illustrate the top-tagging procedure used in the analysis presented in this Letter.Figure 3 shows the distribution of single-jet masses in a selection optimized to identify partially-merged top quark decays.In this topology, the W boson decay products will be merged into a single jet, but the b quark will escape.The figure also shows the jet mass distribution for fully-merged boosted top quarks.The reconstruction of these kinematic observables, in a sample enriched in t t events in data, serves to validate the top-tagging algorithm.Further details are given in Ref. [20] of the Letter.We also repeat the plots from Fig. 1 of the main text here in Fig. 4, including ratios to better show the agreement between data and expectation.

Figure 1 :
Figure1: Comparison between data and SM prediction for reconstructed M tt distributions for the boosted semi-leptonic analysis with 0 b-tagged jets (a) and ≥1 b-tagged jets (b), as well as for the all-hadronic analysis (c).For the semi-leptonic analyses, "others" refers to all nontop backgrounds, while for the all-hadronic analysis, "NTMJ" refers to the "non-top multijet" background.The shaded band corresponds to the SM background uncertainty.The likelihood fit projection on data for the semi-leptonic resolved analysis is shown in (d).A cross section of 1.0 pb is used for the normalization of the Z samples.

Figure 2 :
Figure2: The 95% CL upper limits on the production cross section times branching fraction as a function of M tt for Z resonances with Γ Z /M Z = 1.2% compared to predictions from Ref.[10] multiplied by 1.3 to account for higher-order effects[43].

Figure 3 :
Figure 3: Jet mass distribution for fully-merged W decay products (a) and fully-merged top quark candidate jets (b), in the muon + jets selection.The shaded band corresponds to the total SM background uncertainty.

Figure 4 :
Figure 4: Plots from Fig. 1, including plots of the ratio of data and total expected background for each analysis channel.

Table 1 :
Constraints used in the likelihood maximization.The M tt distributions of the boosted channels are combined into a single joint likelihood, imposing consistency of the various background and signal components.

Table 2 :
Number of expected and observed events in the boosted analyses.

Table 3 :
95% CL lower limits on the masses of new particles in specific models.