Measurement of the top quark mass in the dileptonic t t-bar decay channel using the mass observables M[bl], M[T2], and M[blnu] in pp collisions at sqrt(s) = 8 TeV

A measurement of the top quark mass (M[t]) in the dileptonic t t-bar decay channel is performed using data from proton-proton collisions at a center-of-mass energy of 8 TeV. The data was recorded by the CMS experiment at the LHC and corresponding to an integrated luminosity of 19.7 +/- 0.5 inverse femtobarns. Events are selected with two oppositely charged leptons (l = e, mu) and two jets identified as originating from b quarks. The analysis is based on three kinematic observables whose distributions are sensitive to the value of M[t]. An invariant mass observable, M[bl], and a `stransverse mass' observable, M[T2], are employed in a simultaneous fit to determine the value of M[t] and an overall jet energy scale factor (JSF). A complementary approach is used to construct an invariant mass observable, M[blnu], that is combined with M[T2] to measure M[t]. The shapes of the observables, along with their evolutions in M[t] and JSF, are modeled by a nonparametric Gaussian process regression technique. The sensitivity of the observables to the value of M[t] is investigated using a Fisher information density method. The top quark mass is measured to be 172.22 +/- 0.18 (stat) +0.89/-0.93 (syst) GeV.


Introduction
The top quark mass is a fundamental parameter of the standard model (SM), and an important component in global electroweak fits evaluating the self-consistency of the SM [1].In addition, the value of M t has implications for the stability of the SM electroweak vacuum due to the role of the top quark in the quartic term of the Higgs potential [2].Measurements of M t have been conducted by the CDF and D0 experiments at the Tevatron, and by the ATLAS and CMS experiments at the CERN LHC.These measurements are typically calibrated against the top quark mass parameter in Monte Carlo (MC) simulation.Studies suggest that this parameter can be related to the top quark mass in a theoretically well-defined scheme with a precision of about 1 GeV [3].A combination of measurements including all four experiments and tt decay channels with zero, one, or two high-p T electrons or muons (all-hadronic, semileptonic, and dileptonic, respectively) gives a value of 173.34 ± 0.36 (stat) ± 0.67 (syst) GeV [4] for the top quark mass.Currently, the most precise experimental determination of M t is provided by CMS using a combination of measurements in all tt decay channels, yielding a value of 172.44 ± 0.13 (stat) ± 0.47 (syst) GeV [5].In the dileptonic tt decay channel, the ATLAS [6] and CMS [5] Collaborations have recently determined M t to be 172.99 ± 0.41 (stat) ± 0.74 (syst) GeV and 172.82 ± 0.19 (stat) ± 1.22 (syst) GeV, respectively.This paper presents a reanalysis of the dileptonic tt data set recorded in 2012, with a primary motivation of reducing the systematic uncertainties in M t determination.
The dileptonic top quark pair (tt) decay topology, tt → (b + ν)(b − ν), with = (e, µ), presents a challenge in mass measurement arising primarily from the presence of two neutrinos in the final state.While the undetected p T of a single final-state neutrino in a semileptonic tt decay can be inferred from the momentum imbalance in the event, the allocation of momentum imbalance between the two neutrinos in a dileptonic tt decay is unknown a priori.For this reason, the dileptonic tt system is kinematically underconstrained, and mass determination cannot be easily conducted on an event-by-event basis.Instead, the mass of the parent top quarks in the dileptonic tt system can be extracted from kinematic features over an ensemble of events, with the help of appropriate observables and reconstruction techniques.
The measurement reported in this paper is based on a set of observables that have been proposed specifically for mass reconstruction in underconstrained decay topologies.These observables include the invariant mass, M b , of a b system, a 'stransverse mass' variable, M bb T2 , constructed with the b and b daughters of the tt system [7][8][9], and the invariant mass of a b ν system, M b ν , where the neutrino momentum is estimated by the M T2 -assisted on-shell (MAOS) reconstruction technique [10].The MAOS reconstruction technique builds on M T2 by exploiting the neutrino momenta estimates that are by-products of the M T2 algorithm.The sensitivity of the M b , M bb T2 , and M b ν observables to the value of M t is investigated using a Fisher information density method.Distributions of M b and M bb T2 in dileptonic events contain a sharp edge descending to a kinematic endpoint, the location of which is sensitive to the value of M t .Recently, masses of the top quark, W boson (M W ), and neutrino (M ν ) were extracted in a simultaneous fit using the endpoints of these distributions in dileptonic tt events [11].The M b , M bb T2 , and MAOS M b ν observables are described in more detail in Section 4. One of the dominant sources of systematic uncertainty limiting the precision of this measurement comes from the overall uncertainty in jet energy scale (JES).To address the JES uncertainty, we introduce a technique that uses the M b and M bb T2 observables to determine an overall jet energy scale factor (JSF) simultaneously with the top quark mass, where the JSF is defined as a multiplicative factor scaling the four-vectors of all jets in the event.Similar techniques have been developed for the all-hadronic and semileptonic tt channels, where the jet pair originat-ing from a W boson decay is used to determine the JSF [5].Because light-quark jets from the W boson decay are used to calibrate the energy scale of b jets arising from the t and t decays, these methods are sensitive to flavor-dependent uncertainties that emerge from differences in the response of b jets and light-quark jets.In the method featured here, the JSF is determined in the dileptonic tt channel without relying on a W boson decaying to jets.Instead, it achieves sensitivity to the JSF through the kinematic differences between b jets, which are subject to JSF scaling, and leptons, which are not.Because it does not use light quarks from a hadronic W boson decay, this approach is insensitive to flavor-dependent JES uncertainties.
To model the M b , M bb T2 , and MAOS M b ν distribution shapes, we use a Gaussian process (GP) regression technique [12,13].This technique is nonparametric, and thus largely modelindependent.It is effective in modeling distribution shapes when no theoretical guidance is available to specify a functional form.The distribution shapes can conveniently be modeled as functions of multiple variables.In this analysis, three variables are used: the value of the relevant observable (M b , M bb T2 , or M b ν ), M t , and the JSF.The shapes are determined using simulated events generated with seven different values of M t ranging from 166.5 to 178.5 GeV, and with five values of JSF, ranging from 0.97 to 1.03, applied to the jets in each event.Each shape ultimately models the distributions of the observables together with their evolution in M t and in JSF.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections.The tracker has a track-finding efficiency of more than 99% for muons with transverse momentum p T > 1 GeV and pseudorapidity |η| < 2.4.The ECAL is a fine-grained hermetic calorimeter with quasi-projective geometry, and is distributed in the barrel region of |η| < 1.48 and in two endcaps that extend up to |η| < 3.0.The HCAL barrel and endcaps similarly cover the region |η| < 3.0.In addition to the barrel and endcap detectors, CMS has extensive forward calorimetry.Muons are measured in gas-ionization detectors, which are embedded in the steel flux-return yoke outside of the solenoid.The silicon tracker and muon systems play a crucial role in the identification of jets originating from the hadronization of b quarks [14].Events of interest are selected using a two-tiered trigger system [15].The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs.The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to less than 1 kHz before data storage.A more detailed description of the CMS detector, together with a definition of the coordinate system used, can be found in Ref. [16].
ticle in an event by combining information from various subdetectors of CMS.Each event is required to have at least one reconstructed collision vertex, with the primary vertex selected as the one containing the largest ∑ p 2 T of associated tracks.Electron candidates are reconstructed by matching a cluster of energy deposits in the ECAL to a reconstructed track [20].They are required to satisfy p T > 20 GeV and |η| < 2.5.Muon candidates are reconstructed in a global fit that combines information from the silicon tracker and muon system [21], and must have p T > 20 GeV and |η| < 2.4.A requirement on the relative isolation is imposed inside a cone ∆R =

√
(∆η) 2 + (∆φ) 2 around each lepton candidate, where φ is the azimuthal angle in radians.A parameter I rel = ∑ p T i /p T is defined, where the sum includes all reconstructed PF candidates inside the cone (excluding the lepton itself), and p T is the lepton p T .Electron (muon) candidates are required to have I rel < 0.15 (0.2) with ∆R < 0.3 (0.4).Events selected offline are required to contain exactly two such leptons, ee, eµ, or µµ, with opposite charge.For events containing an e + e − or µ + µ − pair, contributions from low-mass resonances are suppressed by requiring an invariant mass of the lepton pair M > 20 GeV, while contributions from Z boson decays are suppressed by requiring that |M Z − M | > 15 GeV, where M Z = 91.2GeV [22].
Hadronic jets are clustered from PF candidates with the infrared and collinear safe anti-k T algorithm [23], with a distance parameter R of 0.5, as implemented in the FASTJET package [24].The jet momentum is determined as the vectorial sum of all particle momenta in this jet.Corrections to the JES and jet energy resolution (JER) are derived using MC simulation, and are confirmed with measurements of the energy balance in quantum chromodynamics (QCD) dijet, QCD multijet, photon+jet, and Z+jet events [25].Muons, electrons, and charged hadrons originating from multiple collisions within the same or nearby bunch crossings (pileup), are not included in the jet reconstruction.Contributions from neutral hadrons originating from pileup are estimated and subtracted from the JES.Jets originating from the hadronization of b quarks are identified with a combined secondary vertex (CSV) b tagging algorithm [14], combining information from the jet secondary vertex with the impact parameter significances of its constituent tracks.The algorithm yields a tagging efficiency of approximately 85% and a misidentification rate of 10%.Events are required to contain at least two jets that pass the b tagging algorithm and satisfy p T > 30 GeV and |η| < 2.5.In this analysis, the two jets satisfying these requirements that have the highest CSV discriminator values are referred to as b jets.
The missing transverse momentum vector is defined as p miss T = − ∑ p Ti , where the sum includes all reconstructed PF candidates in an event [26].Its magnitude is referred to as p miss T .Corrections to the JES and JER are propagated into p miss T , as well as an offset correction that accounts for pileup interactions.An additional correction mitigates a mild azimuthal dependence, arising from imperfect detector alignment and other effects, which is observed in the reconstructed p miss T .To further suppress contributions from Drell-Yan processes, events containing an e + e − or µ + µ − pair are required to have p miss T > 40 GeV.Simulated tt signal events are generated with the MADGRAPH 5.1.5.11 matrix-element generator [27], combined with MADSPIN to include spin correlations of the top quark decay products [28], PYTHIA 6.426 with the Z2 * tune for parton showering [29], and TAUOLA for the decay of τ leptons [30].Parton distribution functions (PDFs) are described by the CTEQ6L1 set [31].The tt signal events are generated with seven different values of M t ranging from 166.5 to 178.5 GeV.The contribution from the W associated single top quark production (tW) is simulated with POWHEG 1.380 [32][33][34][35], where the value of M t is assumed to be 172.5 GeV. Background events from W+jets and Z+jets production are generated with MADGRAPH 5.1.3.30, and contributions from WW, WZ, ZZ processes are simulated with PYTHIA.The CMS detector response to the simulated events is modelled with GEANT4 [36].All background processes are normalized to their predicted cross sections [37][38][39][40][41].
With the requirements outlined previously, 41 640 tt candidate events are selected in data.The sample composition is estimated in simulation to be 95% dileptonic tt, 4% single top quark, and 1% other processes including diboson, W+jets, and Drell-Yan production, as well as semileptonic and all-hadronic tt.

Observables
The observables featured in this study have been developed for physics scenarios where undetected particles, such as neutrinos, carry away a portion of the kinematic information necessary for full event reconstruction.In the dileptonic tt system, distributions in these observables contain endpoints, edges, and peak regions that are sensitive to the top quark mass.The observables are described in more detail below.

The M b observable
The M b observable is defined as where p b and p are four-vectors corresponding to a b jet and lepton, respectively.The b pairs underlying each value of M b are chosen out of four possible combinations by an algorithm described below.The M b observable contains a kinematic endpoint that occurs when the b jet and lepton are directly back-to-back in the top quark rest frame.The location of this endpoint, (M b ) max , is a function of the masses involved in the decay: With M t = 172.5 GeV, M W = 80.4 GeV [22], and M ν = 0, we have (M b ) max = 152.6GeV.Although this endpoint is a theoretical maximum on the value of M b at leading order, events are still observed beyond this value due to background contamination, resolution effects, and nonzero particle widths.
The M b distribution is shown in data and MC simulation in Fig. 1 (left), with a breakdown of signal and background events shown in the simulation.The 'signal' category includes tt dilepton decays where both b jets are correctly identified by the b tagging algorithm.The background categories include: 'mistag' dilepton decays where a light quark or gluon jet is incorrectly selected by the b tagging algorithm; 'τ decays' where dilepton events include at least one τ lepton in the final state subsequently decaying leptonically; and 'hadronic decays' that include events where at least one of the top quarks decays hadronically.The 'non-tt bkg' category consists of single top quark, diboson, W+jets, and Drell-Yan processes.Events in which a top quark decays through a τ lepton contain extra neutrinos stemming from the leptonic τ decay.Although the extra neutrinos cause a small distortion to the kinematic distributions, these events still contribute to the sensitivity of the measurement.that are not sensitive, such as the stationary point where the three shapes intersect.To provide a quantitative description of these effects, we introduce a 'local shape sensitivity' function, also known as the Fisher information density, shown in Figs. 1, 3, and 4.This function conveys the sensitivity of an observable at a specific point on its shape.For the M b observable, the local shape sensitivity function peaks near the kinematic endpoint (M b ∼ 150 GeV), and has a zero value at the stationary point (M b ∼ 105 GeV).The integral of this function over its range is proportional to 1/σ 2 M t , where σ M t is the statistical uncertainty on a measurement of M t .A full description of the local shape sensitivity function is given in Appendix A.

b jet and lepton combinatorics
The two b jets and two leptons stemming from each tt decay give rise to a two-fold matching ambiguity, with two correct and two incorrect b pairings possible in each event.Pairings in which the b jet and lepton emerge from different top quarks do not necessarily obey the upper bound described in Eq. (2), and thus do not have a clean kinematic endpoint in M b .Although a priori it is experimentally difficult to distinguish between correct and incorrect pairings, one possible approach is to select the smallest two M b values in each event.This way, the kinematic endpoint of the distribution is preserved -even if the smallest two M b values do not correspond to the correct pairings, they are guaranteed to fall below the correct pairings, which do respect the endpoint.In this analysis, we employ a slightly more sophisticated matching technique, introduced in Ref. [11], where either two or three b pairs are selected in each event.
By selecting either two or three b pairs in each event, the technique employed in this analysis has the benefit of increased statistical power, while preserving the kinematic endpoint of M b .Although they are not necessarily the correct pairs, the corresponding M b values are guaranteed by construction to be less than or equal to those of the correct pairs.The matching technique is based on the following prescription: 1. match each b jet with the lepton that produces the lower M b value; 2. match each lepton with the b jet that produces the lower M b value.This recipe produces either two or three values of M b .In the latter case, two different leptons may be successfully paired with the same b jet, and vice versa.Such a configuration highlights the difference between this recipe and the simpler approach of choosing the smallest two values of M b , which do not necessarily incorporate both b jets and both leptons in the event.For example, this could occur if both b jets are matched to a single lepton.In these cases, the next largest M b value is also needed to ensure both b jets and both leptons from the event are used.

The M T2 observable
The M T2 'stransverse mass' observable [7,8] is based on the transverse mass, M T .The transverse mass of the W boson in a W → ν decay is given by where E 2 Tx = m 2 x + p 2 T for x ∈ { , ν}, m x is the particle mass, and p Tx is the particle momentum projected onto the plane perpendicular to the beams.This quantity exhibits a kinematic endpoint at the parent mass, M W , which occurs in configurations when both the lepton and neutrino momenta lie entirely in the transverse plane (up to a common longitudinal boost).The dileptonic tt system has two layers of decays, with t → Wb in the first step followed by W → ν in the second.The result is an event topology with two identical branches, t → b + ν and t → b − ν, each with a visible (b ) and invisible (ν) component.In this case, one value of M T can be computed for each branch.The invisible particle momentum associated with each branch, however, is not known.While for a semileptonic tt decay, with only one W → ν decay, the neutrino p T is estimated from the p miss T in the event, a dileptonic tt decay includes two neutrinos, for which the allocation of p miss T between them is unknown.
The M T2 observable is an extension of M T for a system with two identical decay branches, 'a' and 'b', such as those in the dileptonic tt system.Here, the invisible particle momenta, p a T and p b T , must add up to the total p miss T .The strategy of M T2 is to impose this constraint on the invisible particle momenta, while also performing a minimization in order to preserve the kinematic endpoint of M T .For a general event with a symmetric decay topology, M T2 is defined as where M a T and M b T correspond to the two decay branches.If the invisible particle mass is known, it can be incorporated into the M T2 calculation as well, yielding an endpoint at the parent particle mass.Although the final values of p a T and p b T are typically treated as intermediate quantities in the M T2 algorithm, they are employed as neutrino p T estimates in the MAOS reconstruction technique described in Section 4.3.
The M T2 subsystems in the dileptonic tt event topology.

The M T2 subsystems
In the tt system, there are several ways in which M T2 can be computed, depending on how the decay products are grouped together.The M T2 algorithm classifies them into three categories: upstream, visible, and child particles [42].The child particles are those at the end of the decay chain that are unobservable or simply treated as unobservable.In the latter case, the child particle momenta are added to the p miss T vector.The visible particles are those whose p T values are measured and used in the calculations; and the upstream particles are those from further up in the decay chain, including any initial-state radiation (ISR) accompanying the hard collision.In general, the child, visible, and upstream particles may actually be collections of objects, creating three possible subsystems in the dileptonic tt event topology.These subsystems are illustrated in Fig. 2. For simplicity, we refer to the corresponding M T2 observables as M bb T2 , M T2 , and M b T2 , where: • The M T2 observable uses the two leptons as visible particles, treating the neutrinos as invisible child particles, and combining the b jets with all other upstream particles in the event.
• The M bb T2 observable uses the b jets as visible particles, and treats the W bosons as child particles, ignoring the fact that their charged daughter leptons are indeed observable.It considers only ISR jets as generators of upstream momentum.
• The M b T2 observable combines the b jet and the lepton to form a single visible system, and takes the neutrinos as the invisible particles.A two-fold matching ambiguity results from the matching of b jets to leptons in each event.In order to preserve the kinematic endpoint of the M b T2 distribution, the b pair with the smallest value of M b T2 is used in each event.
The subsystem observable M bb T2 is employed in this study to complement the observable M b .The M bb T2 observable contains an endpoint at the value of M t , and can be combined with M b to mitigate uncertainties due to the JES.This feature is discussed further in Section 5.The distribution of M bb T2 and its sensitivity to the value of M t are shown in Fig. 3.Although M T2 is not directly sensitive to M t , the neutrino p T estimates that are a by-product of its computation are used as an input into the MAOS M b ν reconstruction technique described in Section 4.3. is required to lie outside the opening angle between the two b jet p T vectors in the event.This requirement primarily impacts events at low values of M bb T2 , and its effect on the statistical sensitivity of the observable is small.

The MAOS M b ν observable
The MAOS reconstruction technique employed in this analysis is based on the subsystem observable M T2 .In the M T2 algorithm, an M T variable, defined in Eq. ( 3), is constructed from the + ν and − ν pairs corresponding to each of the tt decay branches.Because the values of neutrino p T are unknown, a minimization is conducted in Eq. ( 4) over possible values consistent with the measured p miss T in each event.
The MAOS technique employs the neutrino p T values that are determined by the M T2 minimization to construct full b ν invariant mass estimates corresponding to each of the tt decay branches.Given the neutrino p T values, the remaining z-components of their momenta are obtained by enforcing the W mass on-shell requirement [22] This yields a longitudinal momentum for each neutrino given by where [10].Given these estimates for the neutrino threemomenta together with M ν = 0, we have the required four vectors to construct an M b ν invariant mass corresponding to the decay products of each top quark.The quadratic equations in Eq. ( 6) underlying the W mass on-shell requirement provide up to two solutions for each value of p zν , yielding a two-fold ambiguity for each neutrino momentum.In addition, there is a two-fold ambiguity resulting from the matching of b jets to ν pairs in the construction of b ν invariant masses.No matching ambiguity exists between leptons and neutrinos, since the + ν and − ν pairs have been fixed by the M T2 algorithm.The combined four-fold ambiguity, along with the two top quark decays in each event, gives up to eight possible values of M b ν .In the measurement, all of the available values are used: for each ν pair, this includes up to two neutrino p zν solutions, and two bν matches.The distribution of MAOS M b ν and its sensitivity to the value of M t are shown in Fig. 4.

Simultaneous determination of M t and JSF
To mitigate the impact of JES uncertainties on the precision of this measurement, we introduce a technique that allows a JSF parameter to be fit simultaneously with M t .The JSF is a constant multiplicative factor that calibrates the overall energy scale of reconstructed jets.It is applied in addition to the standard JES calibration, which corrects the jet response as a function of p T and η.The dominant component of uncertainty in the JES calibration can be attributed to a global factor in jet response, which is captured in the JSF.
The challenge in determining the JSF simultaneously with M t stems from the large degree of correlation between these parameters.In the top quark decay, t → b ν, the JSF directly affects the momentum of the b jet, and indirectly, the inferred momentum of the neutrino, by scaling all jets entering the p miss T sum.The M t parameter affects the momenta of these two particles in addition to the lepton produced in the top quark decay.In the context of observables and distribution shapes, variations in the M t and JSF parameters cause shape changes that are difficult to distinguish.For this reason, a shape-based analysis using a single observable can be implemented to determine either M t or JSF, but not both simultaneously.
To determine the M t and JSF parameters simultaneously, we construct a likelihood function that contains two distributions corresponding to the M b and M bb T2 observables.In this configuration, variations in the parameters produce shifts in each individual distribution.They also create a relative shift between the distributions that provides the additional constraint needed for a simultaneous fit of M t and JSF.The dependence of the M b and M bb T2 distribution shapes on M t is shown in Figs. 1 and 3, and their dependence on the JSF is shown in Fig. 5.The difference in response between the M b and M bb T2 shapes to the JSF parameter is rooted in the reconstructed objects underlying the M b and M bb T2 observables -while each value of M b uses one b jet and one lepton, each value of M bb T2 uses two b jets and no leptons for the visible system.Thus, M bb T2 exhibits a stronger dependence on the JSF.The likelihood fit used in this measurement is described in more detail in Section 7.

Gaussian processes for shape estimation
In this analysis, the M b , M bb T2 , and M b ν distribution shapes are modeled with a GP regression technique that has two main advantages over other commonly-used shape estimate methods.First, the GP shape is nonparametric, determined only by a set of training points and hyperparameters that regulate smoothing; and second, it can be easily trained as a function of several variables simultaneously.The latter feature allows one to capture the smooth evolution of the distribution shapes as the M t and JSF parameters are varied.A detailed introduction to GPs can be found in Refs.[12,13].Here, we give a brief overview of the GP regression technique, with further discussion provided in Appendix B.
The likelihood fit described in Section 7 uses distribution shapes of the form f (x|M t , JSF), where x is the value of an observable (M b , M bb T2 , or M b ν ), and M t and JSF are free parameters in the fit.The shapes f are shown in Figs. 1, 3, and 4 for each observable, where the free parameters are set to M t = 166.5,172.5, or 178.5 GeV and JSF = 1.In Fig. 5, shapes corresponding to the M b and M bb T2 observables are shown with the free parameters set to M t = 172.5 GeV and JSF = 0.97, 1.00, or 1.03.In the figures, these shapes are represented as functions of a single variable (the observable x) with M t and JSF fixed.In GP regression, however, each shape is treated as a function of all three quantities (x, M t , and JSF), and can be described as a probabil-ity density in three dimensions.
Each GP shape is trained using binned distributions of the observable x in MC simulation.For each observable, 35 binned distributions are used, corresponding to seven values of M MC t ranging from 166.5 to 178.5 GeV and five values of JSF ranging from 0.97 to 1.03.Each distribution has 75 bins in x, yielding a total of 2625 training points at which the value of f is known and used as an input into the GP regression process.Each training point is specified by its values of x, M t , and JSF.The GP regression technique interpolates between the discrete values of x, M t , and JSF covered by these training points to provide a shape that is smooth over its range.The smoothness properties of each shape are determined by a kernel function that is set by the analyzer.The GP shapes in this analysis correspond to the kernel function given in Eq. 18 of Appendix B.
The binned distributions used to construct each GP shape are normalized to unity.However, the normalization of the GP shape itself may deviate slightly from unity due to minor imperfections in shape modeling.To mitigate this effect, the GP shape normalization is recomputed for each value of M t and JSF at which the shape is evaluated.In a likelihood fit, the normalization is recomputed for every variation of the fit parameters.

Fit strategy
This measurement employs an unbinned maximum-likelihood fit using the M b , M bb T2 , and MAOS M b ν observables described in Section 4, along with the GP shape estimate technique described in Section 6.The MC samples used to train the GP shapes include the tt signal and background processes described in Section 3.
The likelihood constructed from a single observable, x, is given by: Here, the distribution shape f depends on the value of the free parameters M t and JSF, and expresses the likelihood of drawing some event i where the value of the observable is x i .It is normalized to unity over its range for all values of M t and JSF.The parameters M t and JSF are varied in the fit to maximize the value of the likelihood.
A likelihood containing two observables, x 1 and x 2 , is constructed as a product of individual likelihoods: This analysis employs three different versions of the likelihood fit: 1. the 1D fit uses the M b and M bb T2 observables to determine M t , and JSF is constrained to be unity; 2. the 2D fit also uses M b and M bb T2 but imposes no constraint on the JSF and determines M t and JSF simultaneously; 3. the MAOS fit uses the M bb T2 and M b ν observables to determine M t , and JSF is constrained to be unity.Among these versions, the 1D fit provides the best precision on the value of M t .The 2D fit mitigates the JES uncertainties, which are the largest source of systematic error in the 1D approach.The MAOS fit is expected to yield results similar to the 1D fit, and is presented as a viable alternative that substitutes the M b observable for MAOS M b ν .The best overall precision on M t is given by a combination of the 1D and 2D fits, which is discussed below.The fit results are discussed in Section 9.
The central value and statistical uncertainty on M t and JSF are determined using the bootstrapping technique [43].This method is based on pseudo-experiments rather than the shape of the total likelihood defined in Eq. ( 8) near its maximum, and thus mitigates the effects of correlation between the two observables, x 1 and x 2 , in the likelihood.The technique also mitigates possible correlations within the M b and M b ν observables when multiple values of the observable occur in a single event.The bootstrapping technique is primarily relevant for statistical uncertainty determination, which may otherwise be affected by correlations in the likelihood.The technique has a negligible impact on the central values of M t and JSF.The bootstrap pseudoexperiments are constructed by resampling the full data set with replacement, where the size of each pseudo-experiment is fixed to have the number of events in data (41 640 events).Events are selected at random from the full data set, so that a particular event has the same probability of being chosen at any stage during the sampling process.In this procedure, a single event may be selected more than once for any given pseudo-experiment.In data, all events have an equal probability to be selected.In simulation, the probability of selecting a particular event is proportional to its weight, containing the relevant cross sections, as well as corrections for MC modeling and object reconstruction efficiencies.
The performance of the likelihood fitting approach described above is evaluated using events in simulation, where the true values of M t and JSF are known.The fit is conducted using seven different values of M MC t ranging from 166.5 to 178.5 GeV for each version of the likelihood fit.The results of this performance study are shown in Fig. 6.The likelihood fits are consistent with zero bias, showing that the GP shape modeling technique accurately captures the distribution shapes and their evolution over several values of M MC t .For this reason, no calibration of the fit is necessary for an unbiased determination of the M t and JSF parameters.

Combination of 1D and 2D fits
The 1D and 2D fits discussed above have differing sensitivities to various sources of systematic uncertainty in this measurement.Although the 2D fit successfully mitigates the JES uncertainties, which dominate in the 1D fit, other uncertainties in the 2D method are larger and cause the total precision to worsen (Section 8).The best overall precision on the value of M t is provided by a hybrid fit, defined as a linear combination of the 1D and 2D fits.The measured value of M t in the hybrid fit is given by: where the parameter w hyb determines the relative weight between the 1D and 2D fits in the combination.The value of M hyb t and its statistical uncertainty are extracted using bootstrap pseudo-experiments, as described above.In each pseudo-experiment, the measured value of M hyb t is given by the linear combination in Eq. ( 9) of the measured M 1D t and M 2D t values.A value of w hyb = 0.8 is found to achieve the best precision on M t when both statistical and systematic uncertainties are taken into account.The performance of the hybrid fit, evaluated using MC samples corresponding to seven values of M MC t , is shown in Fig. 6. [GeV]

Systematic uncertainties
The systematic uncertainties evaluated in this measurement are given in Table 1.The uncertainties include experimental effects from detector calibration and object reconstruction, and modeling effects mostly arising from the simulation of QCD processes.All uncertainties are determined by conducting the likelihood fit using events from MC simulation with the relevant parameters varied by ±1∆, where ∆ is the uncertainty on a particular parameter.The difference in the measured top quark mass (δM t ) or JSF (δJSF) is taken to be the corresponding systematic uncertainty.For uncertainties that are evaluated by comparing two or more independent MC samples, the values of δM t and δJSF may be subject to statistical fluctuations.For this reason, if the value of δM t or δJSF is smaller than its statistical uncertainty in a particular systematic variation, the statistical uncertainty is quoted as the systematic uncertainty.Finally, if a systematic uncertainty is one-sided, where both +∆ and −∆ variations produce δM t or δJSF shifts of the same sign, the larger shift is taken as the symmetric systematic uncertainty.
In the hybrid fit, the systematic uncertainties are evaluated according to the linear combination in Eq. ( 9).For each systematic variation, this gives δM This approach provides the smallest overall uncertainty, with the largest contributions stemming from the JES, b quark fragmentation modeling, and hard scattering scale.The next most precise result is given by the 1D fit, also dominated by the same sources of uncertainty.The JES uncertainties are successfully mitigated in the 2D fit.The 2D fit, however, is more sensitive to the uncertainties in the top quark p T spectrum, matching scale, and underlying event tune, so the total systematic uncertainty for the 2D fit is larger than that of the 1D fit.The MAOS fit has a larger total systematic uncertainty than the 1D fit due to its sensitivity to the JES, top quark p T spectrum, and b quark fragmentation modeling uncertainties.Further details on each source of systematic uncertainty are given below.
• Jet energy scale: The JES uncertainty is evaluated separately for four components, which are then added in quadrature [44].The 'Intercalibration' uncertainty arises from the modeling of radiation in the p T -and η-dependent JES determination.The 'In situ' category includes uncertainties stemming from the determination of the absolute JES using γ/Z+jet events.The 'Uncorrelated' uncertainty includes uncertainties due to detector effects and pileup.Finally, the 'Flavor' uncertainty stems from differences in the energy response between different jet flavors -it is a linear sum of contributions from the light quark, charm quark, bottom quark, and gluon responses, which are estimated by comparing the Lund string fragmentation in PYTHIA [29] and cluster fragmentation in HERWIG++ [45] for each type of jet.All JES uncertainties are propagated into the reconstructed p miss T in each event.
• b quark fragmentation: The b quark fragmentation uncertainty includes two components that are implemented using event weights.The first component stems from the b quark fragmentation function, which can modeled using the Lund fragmentation model in the PYTHIA Z2 * tune, or tuned to empirical results from the ALEPH [46] and DELPHI [47] experiments.This component is evaluated by comparing the measurement results in MC simulation using these two tunes of the b quark fragmentation function, with the difference symmetrized to obtain the corresponding uncertainty.The second uncertainty component stems from the B hadron semileptonic branching fraction, which has an impact on the b quark JES due to the production of a neutrino.from B hadron decays and their uncertainties [22].Both uncertainty components are combined in quadrature to obtain the total uncertainty.
• Jet energy resolution: The energy resolution of jets is known to be underestimated in MC simulation compared to data.This effect is corrected with a set of scale factors that are used to smear the jet four-vectors to broaden their resolutions.The scale factors are determined in bins of η.Here, they are varied within their uncertainties, which are typically 2.5-5%.The effect of these variations is also propagated into the p miss T .
• Unclustered energy: The unclustered energy in each event comprises the low-p T hadronic activity that is not clustered into a jet.Here, the scale of the unclustered energy is varied by ±10% [26].
• Pileup: The uncertainty in the number of pileup interactions in MC simulation stems from the instantaneous luminosity in each bunch crossing and the effective inelastic cross section.In this analysis, the number of pileup interactions in MC is reweighted to match the data.The pileup uncertainty is evaluated by varying the effective inelastic cross section by ±5%.• Lepton energy scale: The electron energy scale is varied up and down by 0.6% in the ECAL barrel (|η| < 1.48) and by 1.5% in the ECAL endcap (1.48 < |η| < 3.0) [20].The muon momentum scale is varied up and down by 0.2%.All variations are propagated into the p miss T .
• Lepton identification and isolation: Event weights are applied to adjust the electron and muon yields in MC simulation to account for differences in the identification and isolation efficiencies between data and simulation.For muons, the uncertainty is taken to be 0.5% of the identification event weight, and 0.2% of the isolation event weight [21].For electrons, the uncertainties are estimated in bins of p T and η, and are approximately 0.1-0.5% of the combined event weight for identification and isolation [20].
• b tagging efficiency: Event weights are applied to adjust the b jet yields in MC simulation to account for the difference in the b tagging efficiency between data and MC simulation [14].The uncertainties are evaluated in bins of p T and η.
• Top quark p T reweighting: Event weights are applied in order to compensate for a difference in the top quark p T spectrum between data and MC simulation [48].
The uncertainty is evaluated by comparing the measurement in MC simulation with and without the weights applied.The event weights are not applied in the nominal result.This uncertainty is one-sided by construction, and is not symmetrized.
• Hard scattering scale: The factorization scale, µ F , determines the threshold separating the parton-parton hard scattering from softer interactions embodied in the PDFs.The renormalization scale, µ R , sets the energy scale at which matrix-element calculations are evaluated.Both of these scales are set to µ F = µ R = Q in the matrixelement calculation and the initial-state parton shower of the MADGRAPH samples, where Here, the sum runs over all additional final state partons in the matrix element.The values of µ F and µ R are varied simultaneously up and down by a factor of two to estimate the corresponding uncertainty.
• Matching scale: The matrix element-parton shower matching threshold is used to interface the matrix elements generated in MADGRAPH with parton showers simulated in PYTHIA.Its reference value of 20 GeV is varied up and down by a factor of two.
• Underlying event tunes and color reconnection: The underlying event tunes affect the modeling of soft hadronic activity that results from beam remnants and multiparton interactions in each event.The measurement is conducted with a tt sample from MC simulation using the 'Perugia 2011' tune.It is compared to results using samples with the 'Perugia 2011 mpiHi' and 'Perugia 2011 Tevatron' tunes [49] in PYTHIA, corresponding to an increased and decreased underlying event activity, respectively.The largest difference is symmetrized to obtain the final uncertainty.The color reconnection (CR) uncertainty is evaluated by comparing measurement results using tt samples with the 'Perugia 2011' and 'Perugia 2011 no CR' tunes [49], where CR effects are not included in the latter.The difference is symmetrized to obtain the final uncertainty.
• Matrix-element generator: The measurement is repeated using MC samples produced with the POWHEG event generator, which provides a next-to-leading-order calculation of the tt production.These measurement results are compared with the reference tt MC sample, generated using MADGRAPH, to determine the corresponding uncertainty.
• Parton distribution functions: Initial-state partons are described by PDFs.The corresponding uncertainty is evaluated by applying event weights in the MC simulation to reflect the CT10 PDF set [50] with 50 error eigenvectors.The total PDF uncertainty is determined by adding the variations corresponding to these error sets in quadrature.

Results and discussion
The results for each version of the likelihood fit, determined from 1000 bootstrap pseudoexperiments in each fit, are shown in Fig. 7.The 2D fit uses the M b and M bb T2 observables to simultaneously determine the values of M t and JSF, yielding M 2D t = 171.56± 0.46 (stat) +1. 31  −1.25 (syst) GeV and JSF 2D = 1.011 ± 0.006 (stat) +0.015 −0.014 (syst).The correlation between the M t and JSF fit parameters in the 2D fit is shown in Fig. 8, with a correlation coefficient of ρ = −0.94.The M b and M bb T2 distribution shapes corresponding to the fit results in a typical pseudo-experiment are shown in Fig. 9.The 2D fit is successful in mitigating the uncertainty due to the determination of JES, which is otherwise the largest source of systematic uncertainty in this measurement.In particular, this approach is insensitive to the flavor-dependent component of JES uncertainties -stemming from differences in the response between b jets, light-quark jets, and gluon jets -since predominantly b jets are used for the determination of both M t and JSF parameters.The underlying strategy, rooted in a simultaneous fit of two distributions with differing sensitivities to the JSF, does not rely on any specific assumptions about the event topology or final state.For this reason, it can be a viable option for JES uncertainty mitigation in a variety of physics scenarios.
The 1D fit is also based on the M b and M bb T2 observables, but constrains the JSF parameter to unity.The 1D fit gives a value of M 1D t = 172.39± 0.17 (stat) +0.91 −0.95 (syst) GeV.In this approach, the JES accounts for the largest source of uncertainty.However, other uncertainties are reduced with respect to the 2D fit, resulting in an improved overall precision.
The best overall precision is given by the hybrid fit, which is given by a linear combination of the 1D and 2D fit results.The 1D and 2D fits use the same set of events and an identical likelihood function constructed from the M b and M bb T2 observables.These fits are fully correlated, with the only difference between them stemming from the treatment of the JSF pa-     rameter, which is fixed to unity in the 1D fit and acts as a free parameter in the 2D fit.The choice to fix the JSF parameter or allow it to float has an impact on the fit sensitivity to a variety of uncertainty sources in addition to the JES.A linear combination of the 1D and 2D fits with w hyb = 0.8, as defined in Eq. ( 9), achieves an optimal balance between all uncertainty sources, thus providing the best overall precision.The hybrid fit gives: = 172.22± 0.18 (stat) +0.89 −0.93 (syst) GeV.
The correlation between the M t and JSF fit parameters in the hybrid fit is shown in Fig. 8, with a correlation coefficient of ρ = −0.40.
The MAOS fit substitutes the M b observable for an M b ν invariant mass, yielding a value of M MAOS t = 171.54± 0.19 (stat) +1.27 −1.02 (syst) GeV.The MAOS observable presents a new approach for mass reconstruction in a decay topology characterized by underconstrained kinematics.Here, the MAOS fit provides a determination of M t that is complementary to the 2D, 1D, and hybrid fits.The MAOS M b ν distribution shape corresponding to the fit results in a typical pseudo-experiment is shown in Fig. 10.The results for each version of the likelihood fit are summarized in Fig. 11.

Summary
A measurement of the top quark mass (M t ) in the dileptonic tt decay channel is performed using proton-proton collisions at √ s = 8 TeV, corresponding to an integrated luminosity of 19.7 ± 0.5 fb −1 .The measurement is based on the mass observables M b , M bb T2 , and M b ν , which allow for mass reconstruction in decay topologies that are kinematically underconstrained.The sensitivity of these observables to the value of M t is investigated using a Fisher information density technique.The observables are employed in three versions of an unbinned likelihood fit, where a Gaussian process technique is used to model the corresponding distribution shapes and their evolution in M t and an overall jet energy scale factor (JSF).The Gaussian process

CMS
Figure 11: Summary of the 1D, 2D, hybrid, and MAOS likelihood fit results using the 2012 data set at √ s = 8 TeV, corresponding to an integrated luminosity of 19.7 ± 0.5 fb −1 .A recent dileptonic channel measurement using the 2012 dataset and the most recent combination of M t measurements by CMS in all tt decay channels [5] are shown below the dashed line for reference.
shapes are nonparametric, and allow for a likelihood fitting framework that gives unbiased results.The 2D fit provides the first simultaneous measurement of M t and JSF in the dileptonic channel.It is robust against uncertainties due to the determination of jet energy scale, including the flavor-dependent uncertainty component arising from differences in the response between b jets, light-quark jets, and gluon jets.The fit yields M t = 171.56± 0.46 (stat) +1.31  −1.25 (syst) GeV and JSF = 1.011 ± 0.006 (stat) +0.015 −0.014 (syst).The most precise measurement of M t is given by a linear combination of this result with a fit in which the JSF is constrained to be unity, yielding a value of 172.22 ± 0.18 (stat) +0.89 −0.93 (syst) GeV.This measurement achieves a 25% improvement in overall precision on M t compared to previous dileptonic channel analyses using the 2012 data set at CMS.The improvement can be attributed to a reduction of the systematic uncertainties in the measurement.

A Statistical sensitivity of kinematic observables
The sensitivity of a kinematic observable to the value of a parameter such as M t can be quantified by its Fisher information [51,52].The Fisher information of an observable is related to its likelihood function, L, which we have introduced in Eq. ( 7) and reproduce here: where f (x|m) is the distribution of observable x normalized to unity over its range, m is a free parameter, and N is the number of observations of x.In this measurement, we have x = M b , M bb T2 , or M b ν , m = M t or JSF, and N is a multiple of the total number of events.For simplicity we consider the distribution shape f as a function of only one free parameter.The Fisher information corresponding to the shape f (x|m) is given by: The quantity I(m) provides a measure of curvature near the likelihood maximum.It can be interpreted as the variance of the slope, (∂ log f (x|m)/∂m), known as the 'statistical score' of f (x|m).
The Fisher information is related to the precision of a measurement by the Crámer-Rao bound: where σ m is the statistical uncertainty on parameter m.In a likelihood with large N, the shape of the likelihood near its maximum is roughly Gaussian, and the bound approaches an equality.This expression confirms the expected relationship σ m ∝ 1/ √ N between the statistical uncertainty and the value of N, but also reveals the proportionality factor as the reciprocal of the Fisher information.It expresses the uncertainty σ m in terms of the total number of events, the shape f , and the derivative ∂ f /∂m.The Fisher information also provides a mathematical framework for quantifying the sensitivity of an observable at a specific point on its shape.In this analysis, the M b and M bb T2 observables have kinematic endpoints at approximately M 2 t − M 2 W and M t , respectively; the MAOS M b ν observable is an invariant mass whose shape contains a peak near the value of M t .Because these features carry a dependence on the value of M t , the regions near the endpoints of M b and M bb T2 and the peak of M b ν are expected to contribute significantly to the sensitivity of these observables.To relate these local features to the Fisher information, we consider the integral in Eq. ( 11) over the value of observable x.Here, the integrand of the Fisher information can be interpreted as the contribution to the total sensitivity stemming from a specific value of x.Rewriting the integrand in a more convenient form, we define the 'local shape sensitivity' function by: This function is also known as the Fisher information density.It is shown for the M b , M bb T2 , and MAOS M b ν observables in Figs. 1, 3, and 4, respectively, with m = M t and the JSF parameter fixed to unity.It is observed to peak near the kinematic endpoints of M b and M bb T2 , and on the left-side edge of M b ν .The values of x where s(x|m) = 0 coincide with the stationary points at which the distribution shapes in Figs. 1, 3, and 4 intersect.This is a reflection of the fact that in a likelihood fit, events with a value of x near a stationary point make little or no contribution to the determination of m.In general, the shape of s(x|m) for each observable establishes a link between the underlying kinematic properties of the observable and regions of high and low sensitivity on its shape.In this analysis, it provides heuristic information about the M b , M bb T2 , and MAOS M b ν distributions, and their sensitivity to the value of M t .
In addition to providing heuristic information, the local shape sensitivity function is used in this analysis to identify potential overfitting effects in the Gaussian process (GP) shapes.Overfitting occurs when the interpolation between GP training points is not smooth, causing fluctuations in the shape that may be difficult to identify by eye.Such fluctuations can be a source of bias, both in the determination of M t and its corresponding uncertainties.A typical symptom of overfitting is an under-estimated statistical uncertainty on the value of M t .This can occur when fluctuations in the GP shape increase the value of the slope ∂ f (x|M t )/∂M t appearing in Eq. ( 11), thus artificially increasing the Fisher information of the corresponding shape.The issue is easily revealed by the shape of s(x|M t ), which acquires visible fluctuations when overfitting is indeed present.In such cases, overfitting can be mitigated by increasing relevant GP hyperparameter values to improve the smoothness of the GP shape.

B Gaussian process regression technique
The likelihood fit described in Section 7 uses distribution shapes of the form f (x|M t , JSF), where x is the value of an observable (M b , M bb T2 , or M b ν ), and M t and JSF are free parameters in the fit.In this analysis, the distribution shapes f are modeled with a Gaussian process (GP) regression technique.We define a point, u i , on each distribution shape by its position in x, M t , and JSF: The value of the shape at u i is given by f (u i ) = f (x i |M ti , JSF i ).The point u i can be a training point, at which the value of f is known and used as an input into the GP regression process; or it can be a test point, at which the value of f is to be determined.Each GP shape is trained using binned distributions of the observable x in MC simulation.For each observable, 35 binned distributions are used, corresponding to seven values of M MC t ranging from 166.5 to 178.5 GeV and five values of JSF ranging from 0.97 to 1.03.Each distribution has 75 bins in x, yielding a total of 2625 training points.This binning scheme is chosen to provide an accurate modeling of the distribution shapes, while mitigating the effects of statistical fluctuations.The GP regression technique interpolates between the discrete values of x, M t , and JSF covered by these training points to provide a shape that is smooth over its range.
The 'Gaussian' in GP refers to the distribution of possible values of the shape f .The value at a single point, f (u i ), is distributed according to a one-dimensional Gaussian function rather than being treated as an exact quantity.The mean of this Gaussian function is the most probable value of the shape at that point u i , and it is the value used for likelihood fitting (Section 7); the variance stems from the modeling uncertainty inherent in the GP regression process.The values f (u i ) and f (u j ) at any two points follow a two-dimensional Gaussian distribution and are related by a covariance.The correlation between f (u i ) and f (u j ) determines the degree to which the GP shape is allowed to vary between the points u i and u j .By extension, any N values of the shape are described by an N-dimensional Gaussian distribution, and are related by an N × N covariance matrix.To determine the value of the shape at a test point u N+1 , an (N + 1)-dimensional Gaussian distribution is constructed relating the training point values f (u 1 ) . . .f (u N ) to the test point value f (u N+1 ).Then, f (u 1 ) . . .f (u N ) are fixed to their known To demonstrate this process graphically, we consider a simple GP with one training point, u train , at which the value f (u train ) is known, and one test point, u test , at which the value f (u test ) is to be evaluated.The values of f (u train ) and f (u test ) follow a two-dimensional Gaussian prior distribution with mean values µ train and µ test , and a covariance represented by: where σ 2 train and σ 2 test are the variances of f (u train ) and f (u test ), and ρ is the correlation coefficient.We set µ train = µ test = 0 to reflect our zero prior knowledge of f over its range.The resulting joint Gaussian distribution is represented by the contours in Fig. 12.To evaluate the shape f at the test point, we fix f (u train ) to its known value, indicated by the square point in Fig. 12.The possible values of f (u test ) are now constrained to lie along the horizontal line, giving rise to the conditional Gaussian distribution indicated by the dashed curve.The mean of the conditional Gaussian is taken to be the value of the shape at the test point.
In this analysis, the conditioning process described above is generalized to N + 1 dimensions to accommodate all N training points and one test point at which the shape f is evaluated.The mean, µ N+1 , and variance, σ 2 N+1 , of f at test point u N+1 are given by: where t is a column vector containing the f (u i ) values for all N training points, k = cov( f (u i ), f (u N+1 )) is the covariance between the value of f at the ith training point and the value at the test point, and c = cov( f (u N+1 ), f (u N+1 )).The matrix C N = cov( f (u i ), f (u j )) is the N × N covariance matrix expressing the joint Gaussian distribution between the values of f at all N training points.In this analysis, the value of f at each point is given by the mean defined in Eq. ( 16).The variance in Eq. ( 17) is provided here for completeness.
The covariance cov( f (u i ), f (u j )) between any two points is determined by a kernel function that is set by the analyzer.The kernel function defines the covariance matrix C N in Eqs. ( 16) and ( 17), and its properties determine the smoothness characteristics of the final shape.A conventional choice for the GP kernel function is a Gaussian-this ensures that the correlation between any two points is suppressed at a large separation.In practice, the kernel is a threedimensional function that controls the smoothness of the shape along x, M t , and JSF.It also includes a correlation term between M t and JSF to reflect the kinematic relationship between them.The result is a product of a one-dimensional Gaussian (controlling the smoothness along x) with a two-dimensional Gaussian (controlling the smoothness along M t and JSF).For any two points u i and u j on the shape, the kernel is given by: Here, N 1 , N 2 , θ 1 , θ 2 , θ 3 , and ρ are the GP hyperparameters, σ i is a noise parameter that accounts for the statistical uncertainty on the distribution bin underlying each training point, and δ ij is the Kronecker delta function.The terms inside the exponentials specify the covariance between any two values of the shape as a function of their corresponding x, M t , and JSF.The hyperparameters θ 1 , θ 2 , and θ 3 specify the length scales over which the GP shape is allowed to vary, and ρ is a correlation coefficient that couples the M t and JSF parameters.The hyperparameter N 1 specifies the overall normalization of the kernel function, and N 2 determines the relative normalization between the Gaussian and noise terms.
The values of all hyperparameters are determined with the help of a cross-validation likelihood fit [12], conducted for each observable separately.The length scale hyperparameters (θ 1 , θ 2 , and θ 3 ) must be small enough for the GP shape to pass through the training points, and large enough for the shape to interpolate smoothly between them.Hyperparameters that are underestimated satisfy the former criterion, but cause overfitting to occur in the resulting GP shape.This creates a noisy interpolation between training points, and may lead to bias in the measured value of M t and its uncertainties.In this analysis, the GP shapes are checked for overfitting effects using the local shape sensitivity function described in Appendix A.

Figure 1 :
Figure 1: (Left) the M b distribution in data and simulation with M MC t = 172.5 GeV, normalized to the number of events in the 8 TeV data set corresponding to an integrated luminosity of 19.7 ± 0.5 fb −1 .The lower panel shows the ratio between the data and simulation.Statistical and systematic uncertainties on the distribution in simulation are represented by the shaded area.A description of the systematic uncertainties is given in Section 8. (Right) the M b distribution shapes in simulation, normalized to unit area, corresponding to three values of M MC t are shown together with the 'local shape sensitivity' function, described in Appendix A. The M b distributions include two or three values of M b for each event.The distribution shapes are modeled with a GP regression technique, described in Section 6.

Figure 3 :
Figure 3: Following the conventions of Fig. 1, shown are the (left) M bb T2 distribution in data and simulation with M MC t = 172.5 GeV, and (right) M bb T2 distribution shapes in simulation corresponding to three values of M MC t , along with the 'local shape sensitivity' function.The M bb T2 distributions include one value of M bb T2 for each event if it satisfies the kinematic requirement outlined in Section 4.2.The M bb T2 distribution employed in this analysis includes a kinematic requirement on the upstream momentum, defined as p upst T = ∑ reco p Ti − ∑ b jets p Ti − ∑ leptons p Ti , where the sums are conducted over all reconstructed PF candidates, b jets, and leptons in each event, respectively.The direction of p upst T

Figure 4 :
Figure 4: Following the conventions of Fig. 1, shown are the (left) MAOS M b ν distribution in data and simulation with M MC t = 172.5 GeV, and (right) the MAOS M b ν distribution shapes in simulation corresponding to three values of M MC t , along with the 'local shape sensitivity' function.The MAOS M b ν distributions include up to eight values of M b ν for each event.

Figure 5 :
Figure 5: The (left) M b and (right) M bb T2 distributions in simulation with M t = 172.5 GeV for several values of JSF.Two or three values are included in the M b distribution for each event, and one value is included in the M bb T2 distribution if it satisfies the kinematic requirement outlined in Section 4.2.The distributions are normalized to unit area.The three curves corresponding to each of the M b and M bbT2 distributions are obtained using a GP regression technique described in Section 6.

Figure 6 :
Figure 6: Likelihood fit results as a function of M MC t corresponding to the (top) 2D, (center left) 1D, (center right) MAOS, and (bottom) hybrid fits.For each value of M MC t , the fit is conducted using 50 pseudo-experiments in MC simulation.The mean parameter values, M fit t and JSF fit , are represented by the points, with statistical uncertainties indicated by the error bars.A bestfit line of the form y = ax + b is shown for each fit configuration.

Figure 7 :
Figure 7: Likelihood fit results using 1000 bootstrap pseudo-experiments for the (top) 2D fit, (center left) 1D fit, and (center right) MAOS fit.(Bottom) hybrid fit results given by the linear combination in Eq. (9) of the 1D and 2D fits.The error bars represent the statistical uncertainty corresponding to the number of pseudo-experiments in each bin.

Figure 8 :
Figure 8: Likelihood fit results corresponding to the 2D fit (left) and hybrid fit (right), obtained using 1000 pseudo-experiments constructed with the bootstrapping technique.The shaded histogram represents the number of pseudo-experiments in each bin of M t and JSF.Two-dimensional contours corresponding to −2∆ log(L) = 1(4) are shown, allowing the construction of one (two) σ statistical intervals in M t and JSF.The hybrid fit results are given by a linear combination of the 1D and 2D fit results using Eq.(9).

Figure 9 :
Figure9: Maximum-likelihood fit result in a typical pseudo-experiment of the 2D likelihood fit in data.The best fit parameter values for this pseudo-experiment are M t = 171.99GeV and JSF = 1.007.When the JSF parameter is constrained to be unity in the 1D likelihood fit, the best fit value of M t is 172.48GeV.The lower panel shows the ratio between the distribution in data and the best fit distribution in simulation.

Figure 10 :
Figure 10: The MAOS M b ν distribution corresponding to the maximum-likelihood fit result in a typical pseudo-experiment of the MAOS likelihood fit in data.The best fit value of M t for this pseudo-experiment is 171.54GeV.The lower panel shows the ratio between the distribution in data and the best fit distribution in simulation.

Figure 12 :
Figure 12: Demonstration of the GP conditioning process, given in Eqs.(16) and (17), for one training point and one test point.The covariance between the value of the shape at the training and test point is represented by the ellipse.The known value of the shape at the training point (square point) determines the mean value of the shape at the test point (round point and vertical line).The distribution of possible values of the shape at the test point is represented by the dashed curve.values, and the (N + 1)-dimensional Gaussian distribution is reduced to a one-dimensional conditional Gaussian distribution representing the possible values of f (u N+1 ).

Table 1 :
Systematic uncertainties for the 2D, 1D, hybrid, and MAOS likelihood fits.The breakdown of JES and b quark fragmentation uncertainties into separate components is shown, where the components are added in quadrature to obtain the total.The 'up' and 'down' variations are given separately, with the sign of each variation indicating the direction of the corresponding shift in M t or JSF.The character highlights the uncertainty sources that are large in at least one of the likelihood fits.
The corresponding uncertainty is evaluated by repeating the measurement with branching fraction values of 10.05% and 11.27%, which are variations about the nominal value of 10.50% and encompass the range of values measured