Measurement of the B(s) to mu+ mu- branching fraction and search for B0 to mu+ mu- with the CMS Experiment

Results are presented from a search for the rare decays B(s) to mu+ mu- and B0 to mu+ mu- in pp collisions at sqrt(s) = 7 and 8 TeV, with data samples corresponding to integrated luminosities of 5 and 20 inverse femtobarns, respectively, collected by the CMS experiment at the LHC. An unbinned maximum-likelihood fit to the dimuon invariant mass distribution gives a branching fraction B(B(s) to mu+ mu-) = (3 +1.0/-0.9)E-9, where the uncertainty includes both statistical and systematic contributions. An excess of B(s) to mu+ mu- events with respect to background is observed with a significance of 4.3 standard deviations. For the decay B0 to mu+ mu- an upper limit of B(B0 to mu+ mu-)<1.1E-9 at the 95% confidence level is determined. Both results are in agreement with the expectations from the standard model.


1
In the standard model (SM) of particle physics, tree-level diagrams do not contribute to flavorchanging neutral-current (FCNC) decays.However, FCNC decays may proceed through higherorder loop diagrams, and this opens up the possibility for contributions from non-SM particles.In the SM, the rare FCNC decays B 0 s (B 0 ) → µ + µ − have small branching fractions of B(B 0 s → µ + µ − ) = (3.57± 0.30) × 10 −9 , corresponding to the decay-time integrated branching fraction, and B(B 0 → µ + µ − ) = (1.07 ± 0.10) × 10 −10 [1,2].Charge conjugation is implied throughout this Letter.Several extensions of the SM, such as supersymmetric models with nonuniversal Higgs boson masses [3], specific models containing leptoquarks [4], and the minimal supersymmetric standard model with large tan β [5,6], predict enhancements to the branching fractions for these rare decays.The decay rates can also be suppressed for specific choices of model parameters [7].Over the past 30 years, significant progress in sensitivity has been made, with exclusion limits on the branching fractions improving by five orders of magnitude.
The search for the B → µ + µ − signal, where B denotes B 0 s or B 0 , is performed in the dimuon invariant mass regions around the B 0 s and B 0 masses.To avoid possible biases, the signal region 5.20 < m µµ < 5.45 GeV was kept blind until all selection criteria were established.For the 7 TeV data, this Letter reports a re-analysis of the data used in the previous result [16], where the data were re-blinded.The combinatorial dimuon background, mainly from semileptonic decays of separate B mesons, is evaluated by extrapolating the data in nearby mass sidebands into the signal region.Monte Carlo (MC) simulations are used to account for backgrounds from B and Λ b decays.These background samples consist of B → hµν, B → hµµ, and Λ b → pµν decays, as well as "peaking" decays of the type B → hh , where h, h are charged hadrons misidentified as muons, which give a dimuon invariant mass distribution that peaks in the signal region.The MC simulation event samples are generated using PYTHIA (version 6.424 for 7 TeV, version 6.426 for 8 TeV) [19], with the underlying event simulated with the Z2 tune [20], unstable particles decayed via EVTGEN [21], and the detector response simulated with GEANT4 [22].A normalization sample of B + → J/ψK + → µ + µ − K + decays is used to minimize uncertainties related to the bb production cross section and the integrated luminosity.A control sample of B 0 s → J/ψφ → µ + µ − K + K − decays is used to validate the MC simulation and to evaluate potential effects from differences in fragmentation between B + and B 0 s .The efficiencies of all samples, including detector acceptances, are determined with MC simulation studies.
A detailed description of the CMS apparatus can be found in Ref. [23].The CMS experiment uses a right-handed coordinate system, with the origin at the nominal interaction point, the x axis pointing to the center of the LHC ring, the y axis pointing up, and the z axis along the counterclockwise-beam direction.The polar angle θ is measured from the positive z axis and the azimuthal angle φ is measured in the x-y plane.The main subdetectors used in this analysis are the silicon tracker and the muon detectors.Muons are tracked within the pseudorapidity region |η| < 2.4, where η = − ln[tan(θ/2)].A transverse momentum (p T ) resolution of about 1.5% is obtained for muons in this analysis [24].
The events are selected with a two-level trigger system.The first level only requires two muon candidates in the muon detectors.The high-level trigger (HLT) uses additional information from the silicon tracker to provide essentially a full event reconstruction.The dimuon invariant mass was required to satisfy 4.8 < m µµ < 6.0 GeV.For the 7 TeV data set, the HLT selection required two muons, each with p T > 4.0 GeV, and a dimuon p µµ T > 3.9 GeV.For events having at least one muon with |η| > 1.5, p µµ T > 5.9 GeV was required.For the 8 TeV data set, the p T criterion on the muon with lower p T was loosened to p T > 3.0 GeV, with p µµ T > 4.9 GeV.For events containing at least one muon with |η| > 1.8, the muons were each required to have p T > 4.0 GeV, p µµ T > 7.0 GeV, and the dimuon vertex fit p-value >0.5%.For the normalization and control samples the HLT selection required the following: two muons, each with p T > 4 GeV and |η| < 2.2; p µµ T > 6.9 GeV; 2.9 < m µµ < 3.3 GeV; and the dimuon vertex fit p-value > 15%.Two additional requirements were imposed in the transverse plane: (i) the pointing angle α xy between the dimuon momentum and the vector from the average interaction point to the dimuon vertex had to fulfill cos α xy > 0.9, and (ii) the flight length significance xy /σ( xy ) must be greater than 3, where xy is the two-dimensional distance between the average interaction point and the dimuon vertex, and σ( xy ) is its uncertainty.The signal, normalization, and control triggers required the three-dimensional (3D) distance of closest approach (d ca ) between the two muons to satisfy d ca < 0.5 cm.The average trigger efficiency for events in the signal and normalization samples, as determined from MC simulation and calculated after all other selection criteria are applied, is in the range 39-85%, depending on the running period and detector region.The uncertainty in the ratio of trigger efficiencies (muon identification efficiencies) for the signal and normalization samples is estimated to be 3-6% (1-4%) by comparing simulation and data.
The B → µ + µ − candidates are constructed from two oppositely charged "tight" muons as described in Ref. [25].Both muons must have p T > 4 GeV and be consistent in direction and p T with the muons that triggered the event.A boosted decision tree (BDT) constructed within the TMVA framework [26] is trained to further separate genuine muons from those arising from misidentified charged hadrons.The variables used in the BDT can be divided into four classes: basic kinematic quantities, silicon-tracker fit information, combined silicon and muon track fit information, and muon detector information.The BDT is trained on MC simulation samples of B-meson decays to kaons and muons.Compared to the "tight" muons, the BDT working point used to select muons for this analysis reduces the hadron-to-muon misidentification probability by 50% while retaining 90% of true muons.The probability to misidentify a charged hadron as a muon because of decay in flight or detector punch-through is measured in data from samples of well-identified pions, kaons, and protons.This probability ranges from (0.5-1.3) × 10 −3 , (0.8-2.2)×10 −3 , and (0.4-1.5) × 10 −3 , for pions, kaons, and protons, respectively, depending on whether the particle is in the barrel or endcap, the running period, and the momentum.Each of these probabilities is ascribed an uncertainty of 50%, based on differences between data and MC simulation.
Candidates are kept for further analysis if they have 4.9 < m µµ < 5.9 GeV, after constraining the tracks to a common vertex.The B-candidate momentum and vertex position are used to choose a primary vertex based on the distance of closest approach along the beamline.Since the background level and mass resolution depend significantly on η µµ , where η µµ is the pseudorapidity of the B-meson candidate, the events are separated into two categories: the "barrel channel" with candidates where both muons have |η| < 1.4, and the "endcap channel" containing those where at least one muon has |η| > 1.4.The m µµ resolution, as determined from simulated signal events, ranges from 32 MeV for η µµ ≈ 0 to 75 MeV for |η µµ | > 1.8.Four isolation variables are defined.(1) I = p µµ T /(p µµ T + ∑ trk p T ), where ∑ trk p T is the sum of p T of all tracks, other than muon candidates, satisfying ∆R = (∆η) 2 + (∆φ) 2 < 0.7, with ∆η and ∆φ as the differences in η and φ between a charged track and the direction of the B candidate.The sum includes all tracks with p T > 0.9 GeV that are (i) consistent with originating from the same primary vertex as the B candidate or (ii) have a d ca with respect to the B vertex <0.05 cm and are not associated with any other primary vertex.(2) I µ is the isolation variable of each muon, calculated as for the B candidate, but with respect to the muon track.A cone size of ∆R = 0.5 around the muon and tracks with p T > 0.5 GeV and d ca < 0.1 cm from the muon are used.(3) N close trk is defined as the number of tracks with p T > 0.5 GeV and d ca with respect to the B vertex less than < 0.03 cm.(4) d 0 ca is defined as the smallest d ca to the B vertex, considering all tracks in the event that are either associated with the same primary vertex as the B candidate or not associated with any primary vertex.
The final selection is performed with BDTs trained to distinguish between signal and background event candidates.For the training, B 0 s → µ + µ − MC simulation samples are used for the signal, and candidates from the data dimuon mass sidebands after a loose preselection for the background.The preselection retains at least 10,000 events dominated by combinatorial background for each BDT.To avoid any selection bias, the data background events are randomly split into three sets, such that the training and testing of the BDT is performed on sets independent of its application.Studies with sideband events and signal MC simulation samples with shifted B mass show that the BDT response is independent of mass.Separate BDTs are trained for each of the four combinations of 7 and 8 TeV data and the barrel and endcap regions of the detector.For each BDT, a number of variables is considered and only those found to be effective are included.Each of the following twelve variables, shown to be independent of pileup, are used in at least one of the BDTs: I; I µ ; N close trk ; d 0 ca ; p µµ T ; η µµ ; the B-vertex fit χ 2 per degree of freedom (dof); the d ca between the two muon tracks; the 3D pointing angle α 3D ; the 3D flight length significance 3D /σ( 3D ); the 3D impact parameter δ 3D of the B candidate; and its significance δ 3D /σ(δ 3D ), where σ(δ 3D ) is the uncertainty on δ 3D .The last four variables are computed with respect to the primary vertex.Good agreement between data and MC simulation is observed for these variables.In total, including the division into three sets, 12 BDTs are trained.
The output discriminant b of the BDT is used in two ways for further analysis.(1) In the 1D-BDT method, a minimum requirement on b per channel is used to define the final selection.The requirement on b is optimized for best S/ √ S + B (where S is the expected signal and B the background) on statistically independent data control samples.The optimization gives b > 0.29 for both barrel and endcap in the √ s = 7 TeV data, and b > 0.36 (0.38) in the barrel (endcap) for the √ s = 8 TeV sample.The 1D-BDT method is used for the determination of the upper limit on B(B 0 → µ + µ − ).The signal efficiencies ε tot for method (1) are provided in Table 1, together with the expected number of events (signal and signal plus background) for the B 0 signal region 5.20 < m < 5.30 GeV and the B 0 s signal region 5.30 < m < 5.45 GeV.(2) In the categorized-BDT method, the discriminant b is used to define twelve event categories with different signal-to-background ratios.For the √ s = 7 TeV data in the barrel (endcap) channel, the two categories have boundaries of 0.10, 0.31, 1.00 (0.10, 0.26, 1.00).For the √ s = 8 TeV sample in the barrel (endcap) channel, the corresponding boundaries for the four categories are 0.10, 0.23, 0.33, 0.44, 1.00 (0.10, 0.22, 0.33, 0.45, 1.00).This binning is chosen to give the same expected signal yield in each bin.The dimuon invariant mass distributions for the twelve categories of events are fitted simultaneously to obtain the final results.Method (2) has higher expected sensitivity and thus provides the main methodology for the extraction of B(B 0 s → µ + µ − ).
Table 1: The signal selection efficiencies ε tot , the predicted number of SM signal events N exp signal , the expected number of signal and background events N exp total , and the number of observed events N obs in the barrel and endcap channels for the 7 and 8 TeV data using the 1D-BDT method.The event numbers refer to the B 0 and B 0 s signal regions, respectively.
ε tot [10 The ) selection requires two oppositely charged muons with 3.0 < m µµ < 3.2 GeV and p µµ T > 7 GeV, combined with one or two tracks, assumed to be kaons, fulfilling p T > 0.5 GeV and |η| < 2.4 (|η| < 2.1 in the 8 TeV data).The distance of closest approach between all pairs among the three (four) tracks is required to be less than 0.1 cm.For B 0 s → J/ψφ candidates the two assumed kaon tracks must have invariant mass 0.995 < m KK < 1.045 GeV and ∆R < 0.25.The B vertex is fitted from the three (four) tracks; a candidate is accepted if the resulting invariant mass is in the range 4.8-6.0GeV.The final selection is achieved using the same BDT as for the signal, with the following modifications: the B-vertex χ 2 /dof is determined from the dimuon vertex fit, and for the calculation of the isolation variables all B-candidate decay tracks are neglected.
The total efficiency to reconstruct with the 1D-BDT method a B + → J/ψK + → µ + µ − K + decay, including the detector acceptance, is ε B + tot = (0.98 ± 0.08) × 10 −3 and (0.36 ± 0.04) × 10 −3 , respectively, for the barrel and endcap channels in the 7 TeV analysis, and (0.82 ± 0.07) × 10 −3 and (0.21 ± 0.03) × 10 −3 for the 8 TeV analysis, where statistical and systematic uncertainties are combined in quadrature.The distributions of b for the normalization and control samples are found to agree well between data and MC simulation, with residual differences used to estimate systematic uncertainties.No dependence of the selection efficiency on pileup is observed.The systematic uncertainty in the acceptance is estimated by comparing the values obtained with different bb production mechanisms (gluon splitting, flavor excitation, and flavor creation).The uncertainty in the event selection efficiency for the B + → J/ψK + normalization sample is evaluated from differences between measured and simulated B + → J/ψK + events.The uncertainty in the B 0 s → µ + µ − and B 0 → µ + µ − signal efficiencies (3-10%, depending on the channel and √ s) is evaluated using the B 0 s → J/ψφ control sample.The yields for the normalization (control) sample in each category are fitted with a double (single) Gaussian function.The backgrounds under the normalization and control sample peaks are described with an exponential (plus an error function for the normalization sample).Additional functions are included, with shape templates fixed from simulation, to account for backgrounds from B + → J/ψπ + (Gaussian function) for the normalization sample, and B 0 → J/ψK * 0 (Landau function) for the control sample.In the 7 TeV data, the observed number of B + → J/ψK + candidates in the barrel is (71.2 ± 4.1) × 10 3 and (21.4 ± 1.1) × 10 3 in the endcap channel.For the 8 TeV sample the corresponding yields are (309 ± 16) × 10 3 (barrel) and (69.3 ± 3.5) × 10 3 (endcap).The uncertainties include a systematic component estimated from simulated events by considering alternative fitting functions.The B 0 s → µ + µ − branching fraction is measured using and analogously for the B 0 → µ + µ − case, where N S (N B + obs ) is the number of reconstructed B 0 s → µ + µ − (B + → J/ψK + ) decays, ε tot (ε B + tot ) is the total signal (B + ) efficiency, B(B + ) = (6.0 ± 0.2) × 10 −5 [27] is the branching fraction for B + → J/ψK + → µ + µ − K + , and f u / f s is the ratio of the B + and B 0 s fragmentation fractions.The value f s / f u = 0.256 ± 0.020, as measured by LHCb [28], is used and an additional systematic uncertainty of 5% is assigned to account for possible pseudorapidity and p µµ T dependence of this ratio.Studies based on the B + → J/ψK and B 0 s → J/ψφ control samples reveal no discernible pseudorapidity or p µµ T dependence of this ratio in the kinematic region used in the analysis.
An unbinned maximum-likelihood fit to the m µµ distribution is used to extract the signal and background yields.Events in the signal window can result from genuine signal, combinatorial background, background from semileptonic b-hadron decays, and the peaking background.The probability density functions (PDFs) for the signal, semileptonic, and peaking backgrounds are obtained from fits to MC simulation.The B 0 s and B 0 signal shapes are modeled by Crystal Ball functions [29].The peaking background is modeled with the sum of Gaussian and Crystal Ball functions (with a common mean).The semileptonic background is modeled with a Gaussian kernels method [30,31].The PDF for the combinatorial background is modeled with a first-degree polynomial.Since the dimuon mass resolution σ, determined on an event-by-event basis from the dimuon mass fit, varies significantly, the PDFs described above are combined as a conditional product with the PDF for the per-event mass resolution, such that the Crystal Ball function width correctly reflects the resolution on a per-event basis.To avoid any effect of the correlation between σ and the candidate mass, we divide the invariant mass uncertainty by the mass to obtain a "reduced" mass uncertainty, σ r = σ/m µµ , which is used in the fit.
The dimuon mass distributions for the four channels (barrel and endcap in 7 and 8 TeV data), further divided into categories corresponding to different bins in the BDT parameter b, are fitted simultaneously.The results are illustrated for the most sensitive categories in Fig. 1.The fits for all twelve categories are shown in Appendix A. Pseudo-experiments, done with MC simulated events, confirm the robustness and accuracy of the fitting procedure.
Systematic uncertainties are constrained with Gaussian PDFs with the standard deviations of the constraints set equal to the uncertainties.Sources of systematic uncertainty arise from the hadron-to-muon misidentification probability, the branching fraction uncertainties (dominated by 100% for Λ b → pµν), and the normalization of the peaking background.The B → hh and semileptonic backgrounds are estimated by normalizing to the observed B + → J/ψK + yield.The peaking background yield is constrained in the fit with log-normal PDFs with r.m.s.parameters set to the mean 1-standard-deviation uncertainties.The absolute level of peaking background has been studied on an independent data sample, obtained with single-muon triggers, and is found to agree with the expectation described above.The shape parameters for the peaking and the semileptonic backgrounds and for the signals are fixed to the expectation.The mass scale uncertainty at the B-meson mass is 6 MeV (7 MeV) for the barrel (endcap) channels, as determined with charmonium and bottomium decays to dimuon final states.
An excess of B 0 s → µ + µ − decays is observed above the background predictions.The measured decay-time integrated branching fraction from the fit is B(B As insets, the likelihood ratio scan for each of the branching fractions when the other is profiled together with other nuisance parameters; the significance at which the background-only hypothesis is rejected is also shown.Right, observed and expected CL S for B 0 → µ + µ − as a function of the assumed branching fraction.Figure 3: Plots illustrating the combination of all categories used in the categorized-BDT method (left) and the 1D-BDT method (right).For these plots, the individual categories are weighted with S/(S + B), where S (B) is the signal (background) determined at the B 0 s peak position.The overall normalization is set such that the fitted B 0 s signal corresponds to the total yield of the individual contributions.These distributions are for illustrative purposes only and were not used in obtaining the final results.
where the uncertainty includes both the statistical and systematic components, but is dominated by the statistical uncertainties.The observed (expected median) significance of the excess is 4.3 (4.8) standard deviations and is determined by evaluating the ratio of the likelihood value for the hypothesis with no signal, divided by the likelihood with B(B 0 s → µ + µ − ) floating.For this determination, B(B 0 → µ + µ − ) is allowed to float and is treated as a nuisance parameter in the fit (see the left plot in Fig. 2).The measured branching fraction is consistent with the expectation from the SM.With the 1D-BDT method, the observed (expected median) significance is 4.8 (4.7) standard deviations.Figure 3 shows the combined mass distributions weighted by S/(S + B) for the categorized-BDT (left) and the 1D-BDT (right) methods.However, these distributions are illustrative only and were not used to obtain the final results.
No significant excess is observed for B 0 → µ + µ − , and the upper limit B(B 0 → µ + µ − ) < 1.1 × 10 −9 (9.2 × 10 −10 ) at 95% (90%) confidence level (CL) is determined with the CL S approach [32,33], based on the observed numbers of events in the signal and sideband regions with the 1D-BDT method as summarized in Table 1.The expected 95% CL upper limit for B(B 0 → µ + µ − ) in the presence of SM signal plus background (background only) is 6.3 × 10 −10 (5.4 × 10 −10 ), where the statistical and systematic uncertainties are considered.The right plot in Fig. 2 shows the observed and expected CL S curves versus the assumed B(B 0 → µ + µ − ).From the fit, the branching fraction for this decay is determined to be B(B 0 → µ + µ − ) = (3.5 +2.1 −1.8 ) × 10 −10 .The significance of this measurement is 2.0 standard deviations.The dimuon invariant mass distributions with the 1D-BDT method for the four channels are shown in Fig. 5 in the Appendix.
In summary, a search for the rare decays B 0 s → µ + µ − and B 0 → µ + µ − has been performed on a data sample of pp collisions at √ s = 7 and 8 TeV corresponding to integrated luminosities of 5 and 20 fb −1 , respectively.No significant evidence is observed for B 0 → µ + µ − and an upper limit of B(B 0 → µ + µ − ) < 1.1 × 10 −9 is established at 95% CL.For B 0 s → µ + µ − , an excess of events with a significance of 4.3 standard deviations is observed, and a branching fraction of B(B 0 s → µ + µ − ) = (3.0+1.0 −0.9 ) × 10 −9 is determined, in agreement with the standard model expectations.
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort.In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses.Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMWF and FWF (Austria);

Figure 1 :
Figure 1: Results from the categorized-BDT method of the fit to the dimuon invariant mass distributions for the √ s = 8 TeV data in the barrel (top) and endcap (bottom) for the BDT bins with the highest (left) and second-highest (right) signal-to-background ratio.

Figure 2 :
Figure 2: Left, scan of the ratio of the joint likelihood for B(B 0 s→ µ + µ − ) and B(B 0 → µ + µ − ).As insets, the likelihood ratio scan for each of the branching fractions when the other is profiled together with other nuisance parameters; the significance at which the background-only hypothesis is rejected is also shown.Right, observed and expected CL S for B 0 → µ + µ − as a function of the assumed branching fraction.

Figure 4 :
Figure 4: Results of the fit to the dimuon invariant mass distributions for all BDT bins in the data with the categorized-BDT method.The points are the data, the solid line is the result of the fit, the shaded areas are the two B signals, and the different dotted lines are the backgrounds.

Figure 5 :
Figure 5: Results of the fit to the dimuon invariant mass distributions with the 1D-BDT method for the barrel (left) and endcap (right) from the 7 TeV (top) and 8 TeV (bottom) data samples.The points are the data, the solid line is the result of the fit, the shaded areas are the two B signals, and the different dotted lines are the backgrounds.