Search for supersymmetry in pp collisions at s p ¼ 8 TeV in final states with boosted W bosons and b jets using razor variables

A search for supersymmetry in hadronic final states with highly boosted W bosons and b jets is presented, focusing on compressed scenarios. The search is performed using proton-proton collision data at a center-of-mass energy of 8 TeV, collected by the CMS experiment at the LHC, corresponding to an integrated luminosity of 19 . 7 fb − 1 . Events containing candidates for hadronic decays of boosted W bosons are identified using jet substructure techniques, and are analyzed using the razor variables M R and R 2 , which characterize a possible signal as a peak on a smoothly falling background. The observed event yields in the signal regions are found to be consistent with the expected contributions from standard model processes, which are predicted using control samples in the data. The results are interpreted in terms of gluino-pair production followed by their exclusive decay into top squarks and top quarks. The analysis excludes gluino masses up to 1.1 TeV for light top squarks decaying solely to a charm quark and a neutralino, and up to 700 GeV for heavier top squarks decaying solely to a top quark and a neutralino.


I. INTRODUCTION
The CERN LHC has provided sufficient data to probe a large variety of theories beyond the standard model (SM). Among these, theories based on supersymmetry (SUSY) [1][2][3][4][5][6][7][8][9], which predict the existence of a spectrum of supersymmetric partners to the SM particles, are strongly motivated. Scenarios with nondegenerate supersymmetric particle spectra, with cross sections as low as ≈1 fb, have been explored in many final states; however, as yet no evidence for SUSY has been found.
The focus of many current searches is so-called natural SUSY [10,11], in which the Higgs boson mass can be stabilized without excessive fine-tuning. In natural SUSY scenarios, the Higgsino mass parameter μ is required to be of the order of 100 GeV, and the lightest top squarkt 1 , the gluinog, and the lightest bottom squarkb 1 are constrained to have masses around the TeV scale, while the masses of the other superpartners are unconstrained and can be much heavier and beyond the LHC reach. The possibility that the top squark could be light has motivated several searches by the ATLAS and CMS collaborations [12][13][14][15][16][17][18][19][20][21][22][23] for this sparticle. In general, the sensitivity of these searches diminishes for direct top squark production when the mass of the top squark approaches that of the lightest supersymmetric particle (LSP), which is assumed to be the lightest neutralinoχ 0 1 . For searches that specifically target the decayt 1 → tχ 0 1 , the sensitivity is reduced when the mass difference Δm between the top squark and the LSP is comparable to the top quark mass m t .
Here, we focus on two types of scenarios: the so-called compressed spectrum in which Δm is very small, of the order of a few GeV to tens of GeV (e.g. [24][25][26]), and scenarios where Δm ≈ m t . In the compressed case, the top squark decays to the LSP and soft decay products, which are difficult to detect. When Δm ≈ m t , the signature of top squark production is very similar to that of tt production, which has a much higher cross section. Therefore, to be sensitive to such processes, we cannot solely rely on the top squark decay products. Possibilities to discriminate the signal are tagging the top squark events based on a jet from initial-state radiation (ISR) using the monojet signature [27,28], or searching for top squark events in cascade decays of heavier particles, such as the heavy top squark decayst 2 →t 1 þ H=Z [21], or from gluino decays.
In this paper, we search for the challenging top squark final states described above in gluino decays. Specifically, we consider gluino-pair production where each gluino decays to a top squark and a top quark. We consider the scenarios in which the gluino has a mass of around 1 TeV and the lighter top squark has a mass of a few hundred GeV. Because of the significant mass gap between the gluino and the top squark, the top quark from the gluino decay will receive a large boost. The top squark decays to cχ 0 1 for a small Δm, or to tχ 0 1 for Δm ≈ m t , as in the targeted searches fort 1 → tχ 0 1 mentioned above. The analysis described in this paper is especially sensitive to the decayt 1 → cχ 0 1 . Consequently, this analysis provides new information about the viability of natural SUSY.
In light of the discussion above, it is expected that boosted top quarks are a promising signature of new physics involving a massive gluino decaying to a relatively light top squark. Boosted objects with high transverse momentum, p T , are characterized by merged decay products separated by ΔR ≈ 2m=p T , where m denotes the mass of the decaying particle. For the top quark decay products to be merged within the typical jet size of ΔR ¼ 0.5 requires a top quark momentum of ≈700 GeV, a value difficult to reach with proton-proton collisions at 8 TeV. Therefore, in order to increase the signal efficiency by entering the boosted regime, we focus on W bosons from top quark decays, which require a more accessible p T of around 300 GeV. The targeted final state therefore contains boosted W bosons and jets originating from b quarks (b jets) from top quark decays, light quark jets from unmerged hadronic W boson decay products or charm quarks, and missing energy from the neutralinos. Hadronically decaying boosted W boson candidates are identified using the pruned jet mass [35][36][37] and a jet substructure observable called N-subjettiness [38]. The razor kinematic variables M R and R 2 [39] are used to discriminate the processes with new heavy particles from SM processes in final states with jets and missing transverse energy. To increase the sensitivity to new physics, we perform the analysis by partitioning the (M R , R 2 ) plane into multiple bins. This paper is organized as follows. The razor variables are introduced in Sec. II. Section III gives a brief overview of the CMS detector, while Sec. IV covers the triggers, data sets, and Monte Carlo (MC) simulated samples used in this analysis. Details of the object definitions and event selection are given in Secs. V and VI, respectively. Section VII describes the data/simulation scale factors that are needed to correct the modeling of the boosted W boson tagger. The statistical analysis is explained in Sec. VIII, and Sec. IX covers the systematic uncertainties. Finally, our results and their interpretation are presented in Sec. X, followed by a summary in Sec. XI.

II. RAZOR VARIABLES
The razor variables M R and R 2 [39] are useful for describing a signal arising from the pair production of heavy particles, each of which decays to a massless visible particle and a massive invisible particle. In the twodimensional razor plane, a signal with heavy particles is expected to appear as a peak on top of smoothly falling SM backgrounds, which can be empirically described using exponential functions. For this reason, the razor variables are robust discriminators for SUSY signals in which supersymmetric particles are pair produced and decay to SM particles and the LSP. For the simple case in which the final state comprises two visible particles, e.g. jets, the razor variables are defined using the momentap j 1 andp j 2 of the two jets as where p j 1;2 z are the z components of the j 1;2 momenta,p miss T is the missing transverse momentum, computed as the negative vector sum of the transverse momenta of all observed particles in the event, and E miss T is its magnitude (see Sec. V for a more precise definition). Given M R and the transverse quantity M R T , the razor dimensionless ratio is defined as If the heavy mother particle is denoted by G and the heavy invisible daughter particle is denoted by χ, the peak of the M R distribution and the end point of the M R T distribution are V. KHACHATRYAN et al. PHYSICAL REVIEW D 93, 092009 (2016) both estimates of the quantity ðm 2 G − m 2 χ Þ=m G . When the decay chains are complicated, producing multiple particles in the final state, the razor variables can still be meaningfully calculated by reducing the final state to a two-"megajet" structure. The megajet algorithm aims to cluster visible particles coming from the decays of the same heavy supersymmetric particle. The razor variables M R and R 2 are computed using the four-momenta of the two megajets, where the megajet four-momentum is the sum of the fourmomenta of the particles comprising the megajet. Studies show that, of all the possible clusterings, the one that minimizes the sum of the squared invariant masses of the megajets maximizes the efficiency with which particles are matched to their heavy supersymmetric particle ancestor [40]. Figure 2 shows the simulated distributions of the overall SM background and a T1ttcc signal with m~g ¼ 1 TeV, m~t ¼ 325 GeV, and m~χ0 1 ¼ 300 GeV in the (M R , R 2 ) plane. The binning is chosen in accordance with the exponentially falling behavior of the razor variables, to optimize the statistical precision in each bin. The numerical values for the bin boundaries which are used all through the analysis are given in Table V. The SM background, which mainly arises from multijet production, is dominant at low values of R 2 , while the SUSY-like signal peaks higher in the (M R , R 2 ) plane (M R peaks at around 900 GeV, which is the expected value).
In order to be sensitive to low-E miss T scenarios (small Δm), we use a lower R 2 threshold than that used in previous razor analyses [40][41][42][43]. To exploit the boosted phase space in which the expected signal significance is greater than in the nonboosted phase space, we work at large ðm 2 G − m 2 χ Þ=m G and thus at high M R , allowing us to raise the M R threshold. This has the added virtue of keeping the SM backgrounds at a manageable level.

III. THE CMS DETECTOR
A detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found elsewhere [44]. A characteristic feature of the CMS detector is its superconducting solenoid magnet, of 6 m internal diameter, which provides a field of 3.8 T. Within the field volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter. Muon detectors based on gasionization chambers are embedded in a steel flux-return yoke located outside the solenoid. Events are collected by a two-layer trigger system, where the first level is composed of custom hardware processors, and is followed by a software-based high-level trigger.
The tracking system covers the pseudorapidity region jηj < 2.5, the muon detector jηj < 2.4, and the calorimeters jηj < 3.0. Additionally, the forward region at 3 < jηj < 5 is covered by steel and quartz fiber forward calorimeters. The near hermeticity of the detector permits an accurate measurement of the momentum balance in the transverse plane.

IV. TRIGGER AND EVENT SAMPLES
This analysis is based on a sample of proton-proton collision data at ffiffi ffi s p ¼ 8 TeV collected by the CMS experiment in 2012 and corresponding to an integrated luminosity of 19.7 fb −1 . Events are selected using two triggers, requiring either the highest jet p T or the scalar sum H T of jet transverse momenta to be above given thresholds. The jet p T threshold was 320 GeV (and 400 GeV for a brief data taking period corresponding to 1.8 fb −1 ), while the H T threshold was 650 GeV. The two trigger algorithms were based on a fast implementation of the particle-flow (PF)   D 93, 092009 (2016) reconstruction method [45,46], which is described in Sec. V.
To measure the efficiency of these triggers, samples with unbiased jet p T and H T distributions are obtained using an independent set of triggers that require at least one electron or muon. Figure 3 shows, on the left-hand side, the efficiency of the requirement that events satisfy at least one of the two trigger conditions as well as the baseline selection described in Sec. VI, in the (H T , leading jet p T ) plane. The trigger is fully efficient for events with H T > 800 GeV. In order to account for the lower efficiency of the regions with H T < 800 GeV, the measured trigger efficiency over the (H T , leading jet p T ) plane is applied as an event-by-event weight to the simulated samples. The right-hand side of Fig. 3 shows the trigger efficiency across the (M R , R 2 ) plane for the total simulated background.
Simulated event samples are used to investigate the characteristics of the background and signal processes. Multijet, tt, Wð→ lνÞ þ jets, Z=γ Ã ð→ llÞ þ jets, and Zð→ ννÞ þ jets events are generated using MadGraph 5.1.3.30 [47,48] with CTEQ6L1 [49] parton distribution functions (PDFs), while WW, WZ, and ZZ events are generated using PYTHIA 6.424 [50] with CTEQ6L1 PDFs. In what follows, W and Z bosons will be collectively referred to as V. Single top quark events are generated using POWHEG 1.0 [51,52] and CT10 PDFs [53]. The cross sections for these SM processes are given in Table II. The inclusive background processes are scaled to the highest-order cross section calculation available, whereas leading-order cross sections are used for Wð→ lνÞ þ jets, Z=γ Ã ð→ llÞ þ jets, and Zð→ννÞþjets, which are produced with varying generatorlevel H T requirements. The simplified model signals are produced using MadGraph 5.1.5.4 using CTEQ6L1 PDFs. The signal cross sections are computed at next-toleading order with next-to-leading-log corrections using PROSPINO and NLL-FAST [54][55][56][57][58][59]. The parton-level events are showered and hadronized using PYTHIA 6.426 with tune Z2* [60], which is derived from the Z1 tune [61]. The latter uses the CTEQ5L PDFs [62], whereas Z2* adopts CTEQ6L. For the background events, the response of the CMS detector is simulated in detail using a program (FullSim) based on GEANT 4 [63]. A parametrized fast detector simulation program (FastSim) is used to simulate the detector response for the signal events [64].

V. EVENT RECONSTRUCTION
We select events that have at least one interaction vertex associated with at least four charged-particle tracks. The vertex position is required to lie within 24 cm of the center of the CMS detector along the beam direction and within 2 cm from the center in the plane transverse to the beam. Because of the high instantaneous luminosity of the LHC, hard scattering events are typically accompanied by overlapping events from multiple proton-proton interactions (pileup), and therefore contain multiple vertices. We identify the primary vertex, i.e., the vertex of the hard scatter, as the one with the highest value of the P p 2 T of the associated tracks. Detector-and beam-related filters are used to discard events with anomalous noise that mimic events with high energy and a large imbalance in transverse momentum [65,66].
CMS reconstructs events using the PF algorithm, in which candidate particles (PF candidates) are formed by combining information from the inner tracker, the calorimeters, and the muon system. Each PF candidate is assigned to one of five object categories: muons, electrons, photons, charged hadrons, and neutral hadrons. Contamination from pileup events is reduced by discarding charged PF candidates that are incompatible with having originated from the primary vertex [67]. The average pileup energy associated with neutral hadrons is computed event by event and subtracted from the jet energy and from the energy used when computing lepton isolation, i.e., a measure of the activity around the lepton. The energy subtracted is the average pileup energy per unit area (in Δη × Δϕ) times the jet or isolation cone area [68,69]. Jets are clustered with FastJet 3.0.1 [70] using the anti-k T algorithm [71] with distance parameter ΔR ¼ 0.5. These jets are referred to as AK5 jets. Corrections are applied as a function of jet p T and η to account for the residual effects of a nonuniform detector response. The jet energies are corrected so that, on average, they match those of simulated particle-level jets [72]. After correction, jets are required to have p T > 30 GeV and jηj < 2.4. We use the combined secondary vertex algorithm [73,74] to identify jets arising from b quarks. The medium tagging criterion, which yields a misidentification rate for light quark and gluon jets of ≈1% and a typical efficiency of ≈70%, is used to select b jets. The loose tagging criterion, with a misidentification rate of ≈10% and an efficiency of ≈85%, is used to reject events containing b jets.
To identify boosted W bosons, we follow a similar procedure as outlined in Ref. [75]. Jets are clustered with FastJet using the Cambridge-Aachen algorithm [76] and a distance parameter of 0.8, yielding CA8 jets. Jet energy corrections for these jets are derived from the anti-k T jets with distance parameter ΔR ¼ 0.7. Simulations show that the corrections are valid for CA8 jets and have an additional uncertainty ≤ 2%.
The jet mass is calculated from the constituents of the jet after jet pruning, which removes the softest constituents of the jet. During jet pruning, the jet constituents are reclustered, and at each step the softer and largerangle "protojet" of the two protojets to be merged is removed should it fail certain criteria [35,36]. A CMS study has shown that jet pruning reduces pileup effects and provides good discrimination between boosted W jets and quark/gluon (q=g) jets [37]. We define mass-tagged jets (mW) as CA8 jets with p T > 200 GeV and jet mass within the range 70 < m jet < 100 GeV around the W boson mass.
In addition to the jet mass, we also consider the Nsubjettiness [38] variables, which are obtained by first finding N candidate axes for subjets in a given CA8 jet, and then computing the quantity where R 0 is the original jet distance parameter and k runs over all constituent particles. The subjet axes are obtained with FastJet via exclusive k T clustering, followed by a onepass optimization to minimize the N-subjettiness value. The quantity τ N is small if the original jet is consistent with having N or fewer subjets. Therefore, to discriminate boosted W bosons, which have two subjets, from q/g jets characterized by a single subjet, we require that a W boson mass-tagged jet satisfy τ 2 =τ 1 < 0.5 for it to be classified as a W boson tagged jet (labeled W in the following). The W boson tagging efficiency is dependent on the CA8 jet p T , and is 50%-55% according to simulation. The corresponding misidentification rate is 3%-5%. We also define W boson antitagged jets (aW) as W boson mass-tagged jets that satisfy the complement of the τ 2 =τ 1 criterion, and use these jets to define control regions for data-driven background modeling.
To calculatep miss T , which is used in the calculation of the razor variable R 2 defined in Eqs. (2) and (3), the vector sum over the transverse momenta is taken of all the PF candidates in an event.
Loosely identified and isolated electrons [77] (and muons [78]) with p T > 5 GeV and jηj < 2.5 (2.4) are used both to suppress backgrounds in the signal region and in the definition of the control regions. Tightly identified isolated leptons, electrons (muons) with p T > 10 GeV and jηj < 2.5 (2.4), define a control region enriched in Z → ll events, from which we estimate the systematic uncertainty in the predicted number of Z → νν events in the signal region. Electron candidates that lie in the less wellinstrumented transition region between the barrel and end cap calorimeters, 1.44 < jηj < 1.57, are rejected. We suppress the background from events that are likely to contain τ and other leptons that fail the loose selection by discarding events with isolated tracks with p T > 10 GeV and a track-primary vertex distance along the beam direction jd z j < 0.05 cm.
Known differences between the properties of data and MC simulated data are corrected by weighting simulated events with data/simulation scale factors for the jet energy scale, b tag, W mass-tag, W tag, and W antitag efficiency. The W tagging-related scale factors are described in Sec. VII. In addition, event-by-event weights are used to correct the simulated data so that their pileup, trigger, top quark p T , and ISR characteristics match those of the data.

VI. ANALYSIS STRATEGY AND EVENT SELECTION
We search for deviations from the SM in the (high-M R , high-R 2 ) region using events with at least one boosted W boson, at least one b-tagged jet, and no isolated leptons or tracks. SM backgrounds in the signal region S are estimated using observations in control regions and scale factors, calculated from MC simulation, that relate the number of events in one region to that in another. Three control regions, Q, T, and W, select high-purity samples of multijet, tt, and Wð→ lνÞ þ jets events, respectively. Details of the background estimation method are given in Sec. VIII. D 93, 092009 (2016) Events must satisfy the following baseline selection: (1) have at least one good primary vertex (see Sec. V); (2) pass all detector-and beam-related filters (see Sec. V); (3) have at least three selected AK5 jets of which at least one has p T > 200 GeV, thereby defining the boosted phase space; and (4) satisfy M R > 800 GeV and R 2 > 0.08, where the megajets are constructed from the selected AK5 jets. The details of the event selection in addition to the baseline selection are given in Table I. The signal and control regions are defined using different requirements on the multiplicities of leptons, b-tagged jets, and W-tagged jets, and on kinematic variables that discriminate between different processes. The multijet-enriched control sample Q is used for estimating the multijet background in the S and T regions. To characterize Q, we use the fact that E miss T in multijet events is largely due to jet mismeasurements rather than the escape of particles that interact weakly with the detector; consequently,p miss T will often be aligned with one of the jets. Therefore, a good discriminant between multijet events and events with genuine E miss T is that is, the minimum of the angles betweenp miss T and the transverse momentum of each jet, where i runs over the three leading AK5 jets. Since detector inaccuracies mostly cause undermeasurements of the jet energy and momentum, the variable Δϕ min provides a reliable discrimination of fake E miss T in multijet events. The T and W control regions are used to characterize the tt and W þ jets backgrounds, respectively, in the S region. The contamination in the S region from fully hadronic decays of tt pairs is negligible because they do not produce sufficient genuine E miss T to satisfy our event selection. The tt contamination consists thus of the semileptonic decays of tt pairs in which one W boson is boosted and the other W boson decays to a charged lepton that is not identified. Therefore, the T region is required to have a lepton from the decay of a W boson, at least one b-tagged jet, and a W-tagged jet. Similarly, the W þ jets contribution in the S region comes from leptonic W boson decays in which the charged lepton is not identified and a jet is misidentified as a W jet. Therefore, we require the W region to have events with a lepton from the W boson and a mass-tagged boosted W jet, which is a quark or gluon initiated jet misidentified as a boosted W boson. The N-subjettiness criterion is not imposed in order to maintain high event yields in these control regions and therefore higher statistical precision. I. Summary of the selections used, in addition to the baseline selection, to define the signal region (S), the three control regions (Q, T, W), and the two regions (S 0 , Q 0 ) used for the crosschecks described later in the text. In the T and W regions, we suppress potential signals using the transverse mass, where Δϕ is the difference in azimuthal angle between the leptonp T andp miss T , and p l T is the magnitude of the leptoñ p T . The m T distribution exhibits a kinematic edge at the mass of the W boson for tt and Wð→ lνÞ þ jets processes. However, such an edge is not present for signal events because of the extra contribution to E miss T from neutralinos, which escape direct detection. Therefore, potential signals are suppressed in the T and W regions by requiring m T < 100 GeV. For the W region, we additionally require m T > 30 GeV in order to reduce residual contamination from multijet events, which are expected to have small E miss T and therefore small m T . Table I lists two additional control regions, S 0 and Q 0 , which are used in the crosschecks described later in this section. Figure 4 shows the simulated distributions in the signal region for the M R and R 2 variables, where the smoothly falling nature of the backgrounds, as well as their relative contributions, can be observed. The m T distribution in the T and W regions prior to the m T and Δϕ min selection is shown in Fig. 5, while Fig. 6 shows the Δϕ min distribution in the Q region, for both data and simulated backgrounds. Overall, there is reasonable agreement between the observed and simulated yields. The discrepancies are accommodated by the systematic uncertainties we assign to the simulated yields.
In Table II, we show the expected number of events obtained from simulation for the different background processes and for the example T1ttcc model with m~g ¼ 1 TeV, m~t ¼ 325 GeV, and m~χ0 1 ¼ 300 GeV. The observed event counts after different levels of selection, beyond the trigger requirement, are also reported. The background composition in percent after the baseline, S, Q, T, and W region selections is reported in Table III. The signal region is tt dominated, with additional contributions from Wð→ lνÞ þ jets and multijet processes. Each control region, Q, T, and W, has high purity for the background process it targets, 90% multijet, 83% tt, and single top quark processes, and 85% Wð→ lνÞ þ jets, respectively. The discrepancies between the observations and the simulation are due to uncertainties in the MC modeling, especially for the multijet processes. We do not explicitly estimate the background in the signal region. Rather, from the observations in the control regions, we create a prior distribution (described in  .
"Other" refers to the sum of the small background components Z=γ Ã → llþjets, triboson, and ttV.
The signal is the T1ttcc model with mg The row corresponding to n PV > 0 gives the event counts after applying the noise filters, pileup reweighting, top p T reweighting for tt, ISR reweighting for the signal, and the requirement of at least one primary vertex. The column listing the total number of background events also includes some processes that only contribute at the early stages of the event selection. The cross sections used for each sample are listed in the second line of the header. Several of the simulated background samples were produced with generator-level selections applied, which are not fully covered by the first selection levels listed in this Sec. VIII) for the four background components of the signal region that incorporates all statistical and systematic uncertainties. However, in order to verify that the control regions in data provide adequate models for backgrounds in the signal region and that the translations between different regions behave as expected, we perform two cross-checks, taking into account statistical uncertainties only. In the first cross-check, we predict the background in a signal-like control region, and compare these predictions with the observations in that region. This control region, denoted by S 0 , is defined by inverting the Δϕ min requirement while preserving the rest of the signal selection. The estimated number of events in the S 0 region for the multijet, Wð→ lνÞ þ jets, and top quark processes is computed as follows:N while the estimated number of multijet events in the control region T is given bŷ In Eqs. (7)- (10), the superscripts denote one of the control regions, while the subscripts "other," Wð→ lνÞ, TTJ þ T, and multijet denote the sum of the small backgrounds, Wð→ lνÞ þ jets, tt plus single top quark, and multijet, respectively, while "obs" labels observed counts. These equations are used only in this cross-check. However, they incorporate the same relations between signal and control regions as will be used in the likelihood procedure described in Sec. VIII. As can be seen from Table III, the nominal choice of the parameters associated with systematic uncertainties leads to N T multijet;MC ¼ 0. The total estimated background in S 0 iŝ where i runs over all background processes. For smaller backgrounds,N S 0 i is determined by simulation. Backgrounds are estimated bin by bin in the (M R , R 2 ) space, where the bin boundaries are numerically defined in Table V. However, the estimated scale factors are global as the statistical precision is not sufficient to yield reliable binby-bin estimates. The expected global scale factors, which we denote by κ, are defined in Sec. VIII, which also describes how they are calculated. Figure 7 shows the projection on the M R and R 2 axes of the predicted and observed distributions in the S 0 region. The prediction agrees with observation within ≈20%. This cross-check of the background modeling shows that it is feasible to estimate a multicomponent background in a signal-like region using the control regions we have defined.
In the second cross-check, we use the Q region to estimate the background in a signal-like Q region, denoted by Q 0 , for which Δϕ min > 0.5, from the relationship Here, N MC includes all contributing background processes, and N Q obs is the observed count in the Q region. This test assesses the degree to which the simulated distribution of Δϕ min as well as its extrapolation from the Q region to the S region are reliable. As observed from Table III, the multijet process is only a small contribution in the Q 0 region. Therefore, this cross-check assesses how well the reduction of the multijet process, via the Δϕ min > 0.5 requirement, is modeled. The comparison between prediction and observation can be made from data shown in Fig. 8. The level of discrepancy between the prediction and the observation in TABLE III. Background composition according to simulation after the baseline, S, Q, T, W, Q 0 , and S 0 region selections. "Other" refers to the sum of the small background components Z=γ Ã → ll, triboson, and ttV.  D 93, 092009 (2016) this cross-check is incorporated as a systematic uncertainty of 42% in the global scale factor for the multijet component, as described in Sec. VIII.

VII. THE W BOSON TAGGING SCALE FACTORS
The W boson tagger used in this analysis is the same as that defined and used in previous CMS analyses [75,79]. Since the W boson tagging efficiency does not depend significantly on the event topology, we use the same scale factor [75], as used in these previous analyses, for correcting the modeling differences between FullSim and data for the W boson tagging efficiency and apply the scale factor to processes with genuine hadronically decaying W bosons (mainly tt and signal) in the S and T regions. On the other hand, the data/FullSim scale factors for the misidentification (mistag) efficiency for mass-tagged, antitagged, and tagged W bosons are derived specifically for this analysis. The mistag efficiency is defined as the probability to tag, with one of the W taggers, a jet not originating from the hadronic decay of a W boson. Scale factors are necessary to correct the mistag efficiencies for W boson mass tagging and antitagging in the MC simulation of the Q and W control regions, respectively, whereas the mistag efficiency scale factor for W boson tagging is used to correct simulated events with misidentified W bosons, e.g. multijet or Wð→ lνÞ þ jets events, in the S and T regions. All three mistag efficiency scale factors are derived using the same multijet-enriched control region, defined as region Q with the exception of all selections related to razor variables and W tagging. To obtain the mistag efficiencies ϵ f for W boson tagging, mass tagging, and antitagging, we use the leading CA8 jet in each event and measure the fraction of these jets passing the given tagger. After obtaining ϵ f in both data and FullSim, we compute the scale factor, The scale factors for the W boson tagging, mass tagging, and antitagging mistag efficiency vary between 1.0 and 1.2, 1.1 and 1.4, and 1.2 and 1.5, respectively, depending on the CA8 jet p T . The uncertainties in the scale factor include the statistical uncertainty as well as the trigger efficiency and jet energy scale uncertainties, and vary between 2% and 7% depending on the CA8 jet p T . Because the signal processes are simulated with FastSim, the resulting tagging efficiencies must be corrected for modeling differences between the programs FastSim and FullSim. To compute the W boson tagging efficiency FullSim/FastSim scale factor we use a sample of tt events simulated with FullSim and FastSim. We first determine the W boson tagging efficiency for both samples, considering only events with exactly one hadronically decaying W boson at the generator level for which the closest reconstructed CA8 jet lies within ΔR ¼ 0.8 of the W boson. Since we wish to select boosted W bosons, and not boosted top quarks, we require that there be no (generator-level) b quark from the top quark decay within the cone of the closest CA8 jet. The W boson tagging efficiency as a function of p T for a given sample is then obtained by dividing the p T distribution of the closest CA8 jets that also satisfy the tagging condition (70 < m jet < 100 GeV and τ 2 =τ 1 < 0.5) by the p T distribution of all of the closest CA8 jets. To determine the FullSim/FastSim scale factor for the W boson tagging efficiency, we divide the efficiencies ϵ obtained from the FullSim and FastSim samples, SF Full=Fast ðp T Þ ¼ ϵ FullSim ðp T Þ=ϵ FastSim ðp T Þ. This scale factor is applied to all signal samples and varies between 0.89 and 0.95, depending on the p T of the given CA8 jet, with an uncertainty of less than 3%.

VIII. STATISTICAL ANALYSIS
The statistical analysis of the observations in the signal region is based on a likelihood function, LðσÞ, given by where σ is the total signal cross section, M ¼ 25 is the number of bins in the (M R , R 2 ) plane, N S i is the observed count in bin i of the signal region, and the bin-bybin parameters ϵ, b S multijet , b S TTJ , b S Wð→lνÞ , and b S other are denoted collectively byθ. The parameter ϵ represents the M signal efficiencies (including acceptance) for a given signal model, while the bin-by-bin background parameters for a given background process in the S region are denoted by b S process . The function πðLÞ is the integrated luminosity prior and πðθ 1 ; …;θ M Þ is an evidence-based prior constructed from observations in the control regions and the four global scale factors κ A=B process ¼ process;MC;i , where the sum is over all bins of the simulated data; A and B denote any of the S, Q, T, or W regions.
The association of the global scale factors with the control regions is shown in Fig. 9, which also shows which control regions provide constraints on the background parameters, b S process . Although we use the same global scale factors in each bin, shape uncertainties in the simulated distributions are accounted for by allowing the uncertainty FIG. 9. Graphical representation of the analysis method. The circles represent the signal (S) and control (Q, T, W) regions, with their definition summarized in the associated boxes. Listed inside each circle are the likelihood parameters relevant to that region: the bin-by-bin background parameters b region process for the given region and background process, as well as the global scale factors κ A=B process ¼ process;MC;i , where the sum is over all bins of the simulated data. A connection between two regions indicates that one or more parameters are shared. The total expected background, per the (M R , R 2 ) bin, is the sum of the terms shown for each region. Furthermore, associated with each bin of each region is an observed count, N region , a simulated count, N region process;MC , and a count N region other;MC equal to the sum of the smaller backgrounds, Z=γ Ã → llþjets, diboson, triboson, and ttV, with an associated parameter in the likelihood b region other .
SEARCH D 93, 092009 (2016) in the scale factors to be bin dependent. The 25 signal bins in the (M R , R 2 ) plane are divided into three sets for which different uncertainties are applied: the four bins nearest the origin (set 1), the five surrounding bins (set 2), and the remaining bins (set 3). The likelihood per bin is taken to be pðN S jσ; L;θÞ ¼ PoissonðN S ; ϵσLþ b S multijet þ b S TTJ þ b S Wð→lνÞ þ b S other Þ. The integral in Eq. (15) is approximated using MC integration by sampling the priors πðLÞ and πðθ 1 ; …;θ M Þ and averaging the multibin likelihood with respect to the sampled points fðL;θ 1 ; …;θ M Þg. The priors for the expected integrated luminosity L, signal efficiencies ϵ, and simulated background counts b region process;MC are modeled with gamma function densities, Gaðx; γ; βÞ ¼ β −1 ðx=βÞ γ−1 expð−x=βÞ=ΓðγÞ; ð16Þ in which the mode is set to c and the variance to δc 2 , where c AE δc denotes either the measured integrated luminosity or, for a given bin of a given region and process, the simulated signal efficiency, or the simulated background count. From c AE δc, we calculate the gamma density parameters, where k ¼ ðc=δcÞ 2 . For empty bins, we set γ ¼ 1 and the bin value is constrained to zero by setting the β parameter to 10 −4 . For the signal efficiencies and backgrounds, the prior is modeled hierarchically, whereφ represents parameters that characterize the independent sources of systematic uncertainty, described in Sec. IX. The integral in Eq. (19) is evaluated as follows:φ values are sampled from πðφÞ following the procedure described in Sec. IX, thenc i values from πðc 1 ; …;c M jφÞ, thenθ i values from πðθ i jc i Þ. The sampling from πðφÞ and πðθ i jc i Þ is straightforward because the functional forms are known. However, the sampling ofc i requires running the analysis multiple times, yielding an ensemble of histograms in the (M R , R 2 ) plane, which is the output of the procedure described in Sec. IX. Thereafter, the sampling, which yields the points fðL;θ; …;θ M Þg, proceeds as follows: (1) sample the integrated luminosity parameter; (2) sample the efficiency parameters, ϵ, for every bin and every signal model; (3) sample the background parameters b region process;MC for every bin and every background; (4) scale b Q multijet;MC by a random number sampled from a gamma density of unit mode and standard deviation 0.36 in order to induce the 42% uncertainty in the multijet global scale factor κ Q=S multijet that accounts for deficiencies in the modeling of multijet production, as derived from the second cross-check mentioned in Sec. VI; (5) compute the κ parameters from the appropriate background sums, for example, κ Q=S multijet ¼ P i b Q multijet;MC;i = P b S multijet;MC;i ; (6) scale each κ value by a random number sampled from a gamma density with unit mode and standard deviation of either 0.5 or 1.0 for the bins in set 2 or set 3, respectively, to account for the larger uncertainties in the tails of the simulated distributions; and (7) sample the background parameters b S multijet , b S TTJ , and b S Wð→lνÞ , from the Poisson models of the control regions; for example, for region Q, PoissonðN Q ; κ Q=S b S multijet þ b Q other Þ is mapped to a posterior density in b S multijet using a flat prior in b S multijet , and b S multijet is sampled from the posterior density. If no statistically significant signal is observed, we determine limits on the total signal cross section using the CLs criterion [80][81][82] and the test statistic t σ ¼ 2 ln½LðσÞ=LðσÞ when 0 ≤σ ≤ σ, and t σ ¼ 0 when σ > σ. Large values of t σ indicate incompatibility between the best fit hypothesis σ 0 ¼σ and the hypothesis σ 0 ¼ σ being tested. Given the p values p 0 ¼ Prðt σ > t σ;obs jσ 0 ¼ 0Þ and p σ ¼ Prðt σ > t σ;obs jσ 0 ¼ σÞ, obtained by simulation, a 95% CLs upper limit on the cross section is obtained by solving CLsðσÞ ¼ p σ =p 0 ¼ 0.05. The quantity t σ;obs denotes the observed values of the test statistic, one for each hypothesis σ 0 ¼ σ.

IX. SYSTEMATIC UNCERTAINTIES
The input to the statistical analysis is an ensemble of histograms in the (M R , R 2 ) plane that incorporate systematic uncertainties in the simulated signal and background samples. The independent systematic effects, described below, are sampled simultaneously. For each sampled systematic effect, a Gaussian variate with zero mean and unit variance is used in the calculation of the random shift due to the systematic effect for all the signal and background models. Likewise, the same randomly sampled PDFs are used for all signal and background models. In this way, the statistical dependencies among all bins of the signal and background models are correctly, and V. KHACHATRYAN et al. PHYSICAL REVIEW D 93, 092009 (2016) automatically, modeled. The sampling of the systematic effects is repeated several hundred times. In all cases, except for those associated with PDFs, the systematic uncertainties are in the scale factors (SF) applied to the simulated samples to correct them for modeling deficiencies. We consider the systematic uncertainties in the following quantities: (i) Jet energy scale.-The uncertainties are dependent on jet p T and η [72]. (ii) Parton distribution functions.-We use 100 randomly sampled sets of PDFs from NNPDF23_ lo_as_0130_qed [83], MSTW2008lo68cl [84], and CT10 [53]. The samples for the latter two are generated using the program HESSIAN2REPLICAS, recently released with LHAPDF6 [85]. Given a sampled set i, for PDF set K and the PDF set O with which the events were simulated, events are reweighted using the scale factors, SF K;i ¼ w K;i =w O , where the weights w are products of the event-by-event PDFs for the colliding partons. (iii) Trigger efficiency.-We take the uncertainty in each bin, as a function of H T and leading jet p T , to be the maximum of the statistical uncertainty in the efficiency after the baseline selection and the difference between the efficiencies before and after the baseline selection. the signal samples using an event weight that depends on the p T of the recoiling system. The associated systematic uncertainty is equal to the difference 1 − w ISR , where w ISR is the ISR event weight.
(viii) Top quark transverse momentum.-Differential top quark pair production cross section analyses have shown that the shape of the p T spectrum of top quarks in data is softer than predicted [86]. To account for this, we reweight events based on the p T of the generator level t andt quarks in the tt simulation. The uncertainty associated with this reweighting is taken to be equal to the full amount of the reweighting. (ix) Pileup.-Simulated events are reweighted so that their vertex multiplicity distribution matches that observed in data. The minimum-bias cross section is varied by AE5%, thereby changing the shape of the vertex multiplicity distribution and therefore the weights. (x) Multijet spectrum.-The cross-checks described in Sec. VI showed that there is a 42% uncertainty in the multijet scale factor κ between the S and Q regions. This uncertainty is incorporated by increasing the uncertainty in the κ parameter, as described in Sec. VIII. (xi) Zð→ ννÞ þ jets prediction.-About 8% of the background in the signal region is composed of Zð→ ννÞ þ jets events. Since we require the presence of at least one b-tagged jet, and given the known deficiency in modeling Z production in association with heavy flavor quarks [87], we include an extra systematic uncertainty in the Zð→ ννÞ þ jets contribution. This uncertainty is estimated using a data control region enriched in Zð→ llÞ þ jets, required  D 93, 092009 (2016) to have exactly two tight leptons with the same flavor (e or μ) and opposite charge, 60 < m ll < 120 GeV, at least one b-tagged jet, and at least one W mass-tagged jet. We estimate the uncertainty by first computing bin-by-bin data/simulation ratios in this control region. Then, we take the uncertainty in the ratio in each bin as the standard deviation of a Gaussian density, normalized to the number of events in that bin. Finally, the Gaussian densities from all bins are superposed, and the uncertainty is taken to be the magnitude of the 68% band around a ratio of unity. As noted above, all systematic effects are varied simultaneously across (M R , R 2 ) bins. However, to assess the effect of each systematic uncertainty individually, each one is varied by one standard deviation up and down. The effect on the background count and signal efficiency in the signal region is shown in Table IV. The signal values are obtained from averaging over all mass points in the T1ttcc model (Δm ¼ 25 GeV) plane. The PDF systematic uncertainties are obtained by running over 100 different members from the three PDF sets and fitting a Gaussian function to the efficiency distribution. The last line in the table corresponds to the full sampling of the systematic uncertainties. To obtain this value, we again fit a Gaussian function to the efficiency distribution obtained from the full systematic sampling including 500 variations. Although the effects of some of these systematic uncertainties on the backgrounds are large, they do not influence our results greatly because only the ratios of simulated background counts enter the statistical analysis, not the absolute values. Therefore, most of the systematic effects cancel. The statistical precision on the number of events in the control regions is the leading uncertainty in the background prediction for the search bins at large M R or R 2 . The dominant systematic uncertainty in the signal efficiency arises from the PDFs.

X. RESULTS AND INTERPRETATION
Our background predictions for each bin in the (M R , R 2 ) plane are presented in Fig. 10 and in Table V, which also lists the observed event yield in each bin. The background predictions are presented as the mean and standard deviation as determined from the background prior πðθÞ described in Sec. VIII. The observed event yields are found to be in agreement with the predicted backgrounds from SM processes. Consequently, no evidence of a signal is observed.
We interpret our results in terms of the simplified model spectra T1ttcc and T1t1t, whose diagrams are shown in Fig. 1. These models each have three mass parameters: the gluino, top squark, and LSP masses. The mass of the gluino is varied between 600 and 1300 GeV and that of the LSP between 1 and 500 GeV, while the mass difference between the top squark and the LSP, Δm, is fixed at 10, 25, or 80 GeV for the T1ttcc model, and at 175 GeV for the T1t1t model. In both models the gluino is assumed to decay 100% of the time into a top squark and a top quark.
To illustrate the expected signal sensitivity, we show in Fig. 11 the signal efficiencies as a function of the gluino and neutralino masses, for the T1ttcc model, to which this analysis is particularly sensitive, and for the T1t1t model. Efficiencies of up to 6% in the most boosted regimes are reached. For the T1ttcc model a drop in efficiency is observed for the region of model parameter space with the lowest neutralino mass (m~χ0 1 ¼ 1 GeV), which can be explained by Lorentz boosts. For LSP masses higher than TABLE V. Event yields for the predicted backgrounds and for the data in each of the signal bins in R 2 and M R . The uncertainties in the predictions are the combined statistical and systematic uncertainties obtained using the sampling procedure described in the text.  D 93, 092009 (2016) the mass of the charm quark, the LSP will assume most of the momentum. For the bins with the lowest LSP mass, however, the LSP and the charm quark have about equal mass, so that after the boost they will share the momentum about equally. This results in a softer E miss T spectrum and therefore a lower R 2 value, which reduces the efficiency substantially. Figure 12 shows the observed 95% confidence level (CL) upper limit on the signal cross section as a function of the gluino and neutralino masses, obtained using the CLs method described briefly in Sec. VIII, for the T1t1t model and for the T1ttcc model with Δm ¼ 10, 25, and 80 GeV. Additionally, the figure also shows contours corresponding to the observed and expected lower limits, including their uncertainties, on the gluino and neutralino masses. This analysis has made significant inroads into the parameter space of the T1ttcc model. Gluinos with mass up to about 1.1 TeV have been excluded for neutralinos with a mass less than about 400 GeV when the top squark decays to a charm quark and a neutralino and Δm < 80 GeV. This also means that top squarks with masses up to about 400 GeV have been excluded for small mass differences with the LSP, given the existence of a gluino with a mass less than about 1.1 TeV. Similarly, for the T1t1t model, top squarks with a mass of up to about 300 GeV have been excluded for the scenarios with Δm ¼ 175 GeV and gluino mass less than 700 GeV. The observed limit for this model is lower than the expected limit because of the small excess in the low M R bins for 0.12 ≤ R 2 < 0.16, which are among the most sensitive bins for the T1t1t model.

XI. SUMMARY
We have presented a search for new physics in hadronic final states with at least one boosted W boson and a btagged jet using data binned at high values of the razor kinematic variables, M R and R 2 . The analysis uses 19.7 fb −1 of 8 TeV proton-proton collision data collected by the CMS experiment. The SM backgrounds are estimated using control regions in data. Scale factors, derived from simulations, connect these control regions to the signal region. The observations are found to be consistent with the SM expectation, as shown in Fig. 10 and Table V. The results, which are encapsulated in a binned likelihood, are interpreted in terms of supersymmetric models describing pair production of heavy gluinos decaying to boosted top quarks. Limits are set on the gluino and neutralino masses using the CLs criterion on the gluino-neutralino mass plane, as shown in Fig. 12. Assuming that the gluino always decays into a top squark and a top quark, this analysis excludes gluino masses up to 1.1 TeV for top squarks with a mass of up to about 450 GeV that decay exclusively to a charm quark and a neutralino. In this scenario, the mass difference considered between the top squark and the neutralino is less than 80 GeV. This analysis also excludes gluino masses of up to 700 GeV when the top squark decays solely to a top quark and a neutralino, and the mass difference between the top squark and the neutralino is around the top quark mass.

ACKNOWLEDGMENTS
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies:  [19] CMS Collaboration, Search for top-squark pair production in the single-lepton final state in pp collisions at ffiffi ffi s p ¼ 8 TeV, Eur. Phys. J. C 73, 2677 (2013 [43] CMS Collaboration, Search for supersymmetry using razor variables in events with b-tagged jets in pp collisions at ffiffi ffi s p ¼ 8 TeV, Phys. Rev. D 91, 052018 (2015).