Measurement of the $B^0_s\to\mu^+\mu^-$ branching fraction and effective lifetime and search for $B^0\to\mu^+\mu^-$ decays

A search for the rare decays $B^0_s\to\mu^+\mu^-$ and $B^0\to\mu^+\mu^-$ is performed at the LHCb experiment using data collected in $pp$ collisions corresponding to a total integrated luminosity of 4.4 fb$^{-1}$. An excess of $B^0_s\to\mu^+\mu^-$ decays is observed with a significance of 7.8 standard deviations, representing the first observation of this decay in a single experiment. The branching fraction is measured to be ${\cal B}(B^0_s\to\mu^+\mu^-)=\left(3.0\pm 0.6^{+0.3}_{-0.2}\right)\times 10^{-9}$, where the first uncertainty is statistical and the second systematic. The first measurement of the $B^0_s\to\mu^+\mu^-$ effective lifetime, $\tau(B^0_s\to\mu^+\mu^-)=2.04\pm 0.44\pm 0.05$ ps, is reported. No significant excess of $B^0\to\mu^+\mu^-$ decays is found and a 95 % confidence level upper limit, ${\cal B}(B^0\to\mu^+\mu^-)<3.4\times 10^{-10}$, is determined. All results are in agreement with the Standard Model expectations.

This Letter reports measurements of the B 0 s → µ + µ − and B 0 → µ + µ − time-integrated branching fractions, which supersede the previous LHCb results [9], and the first measurement of the B 0 s → µ + µ − effective lifetime. Results are based on data collected with the LHCb detector, corresponding to an integrated luminosity of 1 fb −1 of pp collisions at a centre-of-mass energy √ s = 7 TeV, 2 fb −1 at √ s = 8 TeV and 1.4 fb −1 recorded at √ s = 13 TeV. The first two datasets are referred to as Run 1 and the latter as Run 2. At various stages of the analysis multivariate classifiers are employed to select the signal. In particular, after trigger and loose selection requirements, B 0 (s) → µ + µ − candidates are classified according to their dimuon mass and the output variable, BDT, of a multivariate classifier based on a boosted decision tree [16], which is employed to separate signal and combinatorial background. The signal yield is determined from a fit to the dimuon mass distribution of candidates and is converted into a branching fraction using as normalisation modes the decays B 0 → K + π − and B + → J/ψK + , with J/ψ → µ + µ − (inclusion of charge-conjugated processes is implied throughout this Letter).
The analysis strategy is similar to that employed in Ref. [9] and has been optimised to enhance the sensitivity to both B 0 s and B 0 decays to µ + µ − . This is achieved through a better rejection of misidentified b-hadron decays such as B 0 (s) → h + h − (where h ( ) = π, K) and the development of an improved boosted decision tree for the BDT classifier. The B 0 s → µ + µ − effective lifetime is measured from the background-subtracted decay-time distribution of signal candidates in the lowest-background BDT region as defined later. To avoid potential biases, candidates in the dimuon mass signal region ([5200, 5445] MeV/c 2 ) were not examined until the analysis procedure was finalised.
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs. [17,18]. It includes a high-precision tracking system consisting of a silicon-strip vertex detector, surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. Particle identification is provided by two ring-imaging Cherenkov detectors, an electromagnetic and a hadronic calorimeter, and a muon system composed of alternating layers of iron and multiwire proportional chambers. The simulated events used in this analysis are produced using the software described in Refs. [19,20].
Candidate events for signal and normalisation are selected by a hardware trigger followed by a software trigger [21]. The B 0 (s) → µ + µ − candidates are predominantly selected by single-muon and dimuon triggers. The B + → J/ψK + candidates are selected in a very similar way, the only difference being a different dimuon mass requirement in the software trigger. Candidate B 0 (s) → h + h − decays are used as control and normalisation channels.
The B 0 (s) → µ + µ − candidates are reconstructed by combining two oppositely charged particles with transverse momentum with respect to the beam, p T , satisfying 0.25 < p T < 40 GeV/c, momentum p < 500 GeV/c, and high-quality muon identification [22]. Compared to the previous analysis, the muon identification requirements are tightened such that the misidentified B 0 (s) → h + h − background is reduced by approximately 50%, while the signal efficiency decreases by about 10%. The muon candidates are required to form a secondary vertex with a vertex-fit χ 2 per degree of freedom smaller than 9 and separated from any primary pp interaction vertex (PV) by a flight distance significance greater than 15. Only muon candidate tracks with χ 2 IP > 25 for any PV are selected, where χ 2 IP is defined as the difference between the vertex-fit χ 2 of the PV formed with and without the particle in question. In the selection, B 0 (s) candidates must have a decay time less than 9 τ B 0 s , χ 2 IP < 25 with respect to the PV for which the χ 2 IP is minimal (henceforth called the B 0 (s) PV), p T > 0.5 GeV/c and a dimuon mass in the range [4900, 6000] MeV/c 2 . A B 0 (s) candidate is rejected if either of the two candidate muons combined with any other oppositely charged muon candidate in the event has a mass within 30 MeV/c 2 of the J/ψ mass [15]. The normalisation channels are selected with almost identical requirements to those applied to the signal sample. The B 0 (s) → h + h − selection is the same as that of B 0 (s) → µ + µ − , except that the muon identification criteria are replaced with hadron identification requirements. The B + → J/ψK + decay is reconstructed by combining a muon pair, consistent with a J/ψ from a detached vertex, and a kaon candidate with χ 2 IP > 25 for all PVs in the event. These selection criteria are completed by a loose requirement on the response of a multivariate classifier, described in Ref. [23] and unchanged since then, applied to candidates in both signal and normalisation channels. The classifier takes as input quantities related to the direction of the B 0 (s) candidate, its impact parameter with respect to the B 0 (s) PV, the separation between the final-state tracks, and their impact parameters with respect to any PV. After the trigger and selection requirements 78 241 signal candidates are found, which form the dataset for the subsequent branching fraction measurement.
The separation between signal and combinatorial background is achieved by means of the BDT variable, where the boosted decision tree is optimised using simulated samples of B 0 s → µ + µ − events for signal and of bb → µ + µ − X events for background. The classifier combines information from the following input variables: ∆φ 2 + ∆η 2 , where ∆φ and ∆η are the azimuthal angle and pseudorapidity differences between the two muon candidates; the minimum χ 2 IP of the two muons with respect to the B 0 (s) PV; the angle between the B 0 (s) candidate momentum and the vector joining the B 0 (s) decay vertex and B 0 (s) PV; the B 0 (s) candidate vertex-fit χ 2 and impact parameter significance with respect to the B 0 (s) PV. In addition, two isolation variables are included, to quantify the compatibility of the other tracks in the event with originating from the same hadron decay as the signal muon candidates. Most of the combinatorial background is composed of muons originating from semileptonic b-hadron decays, in which other charged particles may be produced and reconstructed. The isolation variables are constructed to recognise these particles and differ in the type of tracks being considered: the first considers tracks that have been reconstructed both before and after the magnet, while the second considers tracks reconstructed only in the vertex detector. The isolation variables are determined based on the proximity of the two muon candidates to the tracks of the event and are optimised using simulated B 0 s → µ + µ − and bb → µ + µ − X events. The proximity of each muon candidate to a track is measured using a multivariate classifier that takes as input quantities such as the angular and spatial separation between the muon candidate and the track, the signed distance between the muon-track vertex and the B 0 (s) candidate or primary vertex, and the kinematic and impact parameter information of the track.
The BDT variable is constructed to be distributed uniformly in the range [0,1] for signal, and to peak strongly at zero for background. Its correlation with the dimuon mass is below 5%. Compared to the multivariate classifier used in the previous measurement [9], the combinatorial background with BDT > 0.25 is reduced by approximately 50%, mainly due to the improved performance of the isolation variables.
The expected B 0 (s) → µ + µ − BDT distributions are determined from those of B 0 → K + π − decays in data after correcting them for distortions due to trigger and muon identification. An additional correction is made for the B 0 s signal, assuming the SM prediction, to account for the difference between the B 0 and B 0 s → µ + µ − lifetimes, which affects the BDT distribution. The mass distribution of the signal decays is described by a Crystal Ball function [24]. The peak values for the B 0 s and B 0 mesons are obtained from the mass distributions of B 0 s → K + K − and B 0 → K + π − samples, respectively. The mass resolutions as function of µ + µ − mass are determined with a power-law interpolation between the measured resolutions of charmonium and bottomonium resonances decaying into two muons. The Crystal Ball radiative tail is obtained from simulated B 0 s → µ + µ − events [20], which are smeared such that they reproduce the 23 MeV/c 2 mass resolution measured in data.

The signal branching fractions are measured with
where N B 0 (s) →µ + µ − is the number of observed signal decays, N norm is the number of normalisation-channel decays (B + → J/ψK + and B 0 → K + π − ), B norm is the corresponding branching fraction [15], and sig ( norm ) is the total efficiency for the signal (normalisation) channel. The fraction f d(s) indicates the probability for a b quark to fragment into a B 0 (s) meson. Assuming f d = f u , the fragmentation probability f norm for the B 0 and B + normalisation channel is set to f d . The value of f s /f d in pp collision data at √ s = 7 TeV has been measured by LHCb to be 0.259 ± 0.015 [25]. The stability of f s /f d at √ s = 8 TeV and 13 TeV is evaluated by comparing the observed variation of the ratio of the efficiency-corrected yields of B 0 s → J/ψφ and B + → J/ψK + decays. The effect of increased collision energy is found to be negligible for data at √ s = 8 TeV while a scaling factor of 1.068 ± 0.046 is applied for data at √ s = 13 TeV. The efficiency sig(norm) includes the detector acceptance, trigger, reconstruction and selection efficiencies of the final-state particles. The acceptance, reconstruction and selection efficiencies are computed with samples of simulated events whose decay-time distributions are generated according to the SM prediction. The tracking and particle identification efficiencies are determined using control channels in data [26,27]. The trigger efficiencies are evaluated with data-driven techniques [28].
The combinatorial background is distributed almost uniformly over the mass range. In addition, the signal region and the low-mass sideband ([4900, 5200] MeV/c 2 ) are populated by backgrounds from exclusive b-hadron decays, which can be classified in two categories.
The Run 1 and Run 2 datasets are each divided into five subsets based on bins in the BDT variable with boundaries 0.0, 0.25, 0.4, 0.5, 0.6 and 1.0. The B 0 s → µ + µ − and B 0 → µ + µ − branching fractions are determined with a simultaneous unbinned maximum likelihood fit to the dimuon mass distribution in each BDT bin of the two datasets. The B 0 s → µ + µ − and B 0 → µ + µ − fractional yields in each BDT bin and the parameters of the Crystal Ball functions that describe the shapes of the mass distributions are Gaussianconstrained according to their expected values and uncertainties. The combinatorial background in each BDT bin is parameterised with an exponential function, with a common slope parameter for all bins of a given dataset, while the yield is allowed to vary independently. The exclusive backgrounds are included as separate components in the fit. Their overall yields as well as the fractions in each BDT bin are Gaussian-constrained according to their expected values. Their mass shapes are determined from a simulation for each BDT bin.
The values of the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions obtained from the fit The statistical uncertainty is derived by repeating the fit after fixing all the fit parameters, except the B 0 → µ + µ − and B 0 s → µ + µ − branching fractions, the background yields and the slope of the combinatorial background, to their expected values. The systematic uncertainties of B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) are dominated by the uncertainty on f s /f d and the knowledge of the exclusive backgrounds, respectively. The correlation between the two branching fractions is negligible. The mass distribution of the B 0 (s) → µ + µ − candidates with BDT > 0.5 is shown in Fig. 1, together with the fit result [38].
An excess of B 0 s → µ + µ − candidates with respect to the expectation from background is observed with a significance of 7.8 standard deviations (σ), while the significance of the B 0 → µ + µ − signal is 1.6 σ. The significances are determined, using Wilks' theorem [39], from the difference in likelihood between fits with and without the signal component.
Since no significant B 0 → µ + µ − signal is observed, an upper limit on the branching fraction is set using the CL s method [40]. The ratio between the likelihoods in two hypotheses, signal plus background and background only, is used as the test statistic.    For the B 0 s → µ + µ − lifetime determination, the data are background-subtracted with the sPlot technique [41], using a fit to the dimuon mass distribution to disentangle signal and background components statistically. Subsequently, a fit to the signal decay-time distribution is made with an exponential function multiplied by the acceptance function of the detector. The B 0 s candidates are selected using criteria similar to those applied in the branching fraction analysis, the main differences being a reduced dimuon mass window, [5320, 6000] MeV/c 2 , and looser particle identification requirements on the muon candidates. The former change allows the fit model for the B 0 s → µ + µ − signal to be simplified by removing most of the B 0 → µ + µ − and exclusive background decays that populate the lower dimuon mass region, while the latter increases the signal selection efficiency. Furthermore, instead of performing a fit in bins of BDT, a requirement of BDT > 0.55 is imposed. All these changes minimise the statistical uncertainty on the measured effective lifetime. This selection results in a final sample of 42 candidates.
The mass fit includes the B 0 s → µ + µ − and combinatorial background components. The parameterisations of the mass shapes are the same as used in the branching fraction analysis. The correlation between the mass and the reconstructed decay time of the selected candidates is less than 3%.
The variation of the trigger and selection efficiency with decay time is corrected for in the fit by introducing an acceptance function, determined from simulated signal events that are weighted to match the properties of the events seen in data. The use of simulated events to determine the decay-time acceptance function is validated by measuring the effective lifetime of B 0 → K + π − decays selected in data. The measured effective lifetime is 1.52 ± 0.03 ps, where the uncertainty is statistical only, consistent with the world average [15]. The statistical uncertainty on the measured B 0 → K + π − lifetime is taken as the systematic uncertainty associated with the use of simulated events to determine the B 0 s → µ + µ − acceptance function. The accuracy of the fit for the B 0 s → µ + µ − effective lifetime is estimated using a large number of simulated experiments with signal and background contributions equal, on average, to those observed in the data. The contamination from B 0 → µ + µ − , B 0 (s) → h + h − and semileptonic decays above 5320 MeV/c 2 is small and not included in the fit. The effect on the effective lifetime from the unequal production rate of B 0 s and B 0 s mesons [42] is negligible. A bias may also arise if A µ + µ − ∆Γ = ±1, with the consequence that the underlying decay-time distribution is the sum of two exponential distributions with the lifetimes of the light and heavy mass eigenstates. In this case, as the selection efficiency varies with the decay time, the returned value of the lifetime from the fit is not exactly equal to the definition of the effective lifetime even if the decay-time acceptance function is correctly accounted for. This effect has been evaluated for the scenario where there are equal contributions from both eigenstates to the decay. The result can also be biased if the background has a much longer mean lifetime than B 0 s → µ + µ − decays; this is mitigated by an upper decay-time cut of 13.5 ps. Any remaining bias is evaluated using the background decay-time distribution of the much larger B 0 → K + π − data sample. All of these effects are found to be small compared to the statistical uncertainty and combine to give 0.05 ps, with the main contributions arising from the fit accuracy and the decay-time acceptance (0.03 ps each). The mass distribution of the selected B 0 s → µ + µ − candidates is shown in Fig. 2 (top). In summary, a search for the rare decays B 0 s → µ + µ − and B 0 → µ + µ − is performed in pp collision data corresponding to a total integrated luminosity of 4.4 fb −1 . The B 0 s → µ + µ − signal is seen with a significance of 7.8 standard deviations and provides the first observation of this decay from a single experiment. The time-integrated B 0 s → µ + µ − branching fraction is measured to be 3.0 ± 0.6 +0.3 −0.2 × 10 −9 , under the A µ + µ − ∆Γ = 1 hypothesis. This is the most precise measurement of this quantity to date. In addition, the first measurement of the B 0 s → µ + µ − effective lifetime, τ (B 0 s → µ + µ − ) = 2.04 ± 0.44 ± 0.05 ps, is presented. No evidence for a B 0 → µ + µ − signal is found, and the upper limit B(B 0 → µ + µ − ) < 3.4 × 10 −10 at 95% confidence level is set. The results are in agreement with the SM predictions and tighten the existing constraints on possible New Physics contributions to these decays.  [33] LHCb collaboration, R. Aaij et al., Measurement of the ratio of B + c branching fractions to J/ψπ + and J/ψµ + ν µ , Phys. Rev. D90 (2014) 032009, arXiv:1407.2126.