Measurement of the $B^0_s \to \mu^+ \mu^-$ branching fraction and search for $B^0 \to \mu^+ \mu^-$ decays at the LHCb experiment

A search for the rare decays $B^0_s \to\mu^+\mu^-$ and $B^0 \to\mu^+\mu^-$ is performed at the LHCb experiment. The data analysed correspond to an integrated luminosity of 1 fb$^{-1}$ of $pp$ collisions at a centre-of-mass energy of 7 TeV and 2 fb$^{-1}$ at 8 TeV. An excess of $B^0_s \to\mu^+\mu^-$ signal candidates with respect to the background expectation is seen with a significance of 4.0 standard deviations. A time-integrated branching fraction of ${\cal B}(B^0_s \to\mu^+\mu^-) = (2.9^{+1.1}_{-1.0})\times 10^{-9}$ is obtained and an upper limit of ${\cal B}(B^0 \to\mu^+\mu^-)<7.4\times 10^{-10}$ at 95% confidence level is set. These results are consistent with the Standard Model expectations.

The first search for dimuon decays of B mesons took place 30 years ago [9]. Since then, possible deviations from the SM prediction have been constrained by various searches, with the most recent results available in Refs. [10][11][12][13][14]. The first evidence for the B 0 s → µ + µ − decay was reported by LHCb in Ref. [12], with B(B 0 s → µ + µ − ) = (3.2 +1.5 −1.2 ) × 10 −9 , together with the lowest limit on the B 0 decay, B(B 0 → µ + µ − ) < 9.4×10 −10 at 95 % confidence level (CL). The results presented in this Letter improve on and supersede our previous measurements [12]. They are based on data collected with the LHCb detector, corresponding to an integrated luminosity of 1 fb −1 of pp collisions at the LHC recorded in 2011 at a centre-of-mass energy √ s = 7 TeV, and 2 fb −1 recorded in 2012 at √ s = 8 TeV. These data include an additional 1 fb −1 compared to the sample analysed in Ref. [12], and have been reconstructed with improved algorithms and detector alignment parameters leading to slightly higher signal reconstruction efficiency and better invariant mass resolution. The samples from the two centre-of-mass energies are analysed as a combined dataset.
The analysis strategy is very similar to that employed in Ref. [12], with a different multivariate operator based on a boosted decision trees algorithm (BDT) [15,16]. After trigger and loose selection requirements, B 0 (s) → µ + µ − candidates are classified according to dimuon invariant mass and BDT output. The distribution of candidates is compared with the background estimates to determine the signal yield and significance. The signal yield is converted into a branching fraction using a relative normalisation to the channels B 0 → K + π − and B + → J/ψK + with J/ψ → µ + µ − . Inclusion of charge-conjugated processes is implied throughout this Letter. To avoid potential biases, candidates in the signal regions were not examined until the analysis procedure had been finalised.
Signal and normalisation candidate events are selected by a hardware trigger and a subsequent software trigger [24]. The B 0 (s) → µ + µ − candidates are predominantly selected by single-muon and dimuon triggers. Candidate B + → J/ψK + decays are selected in a similar way, the only difference being a different dimuon mass requirement in the software trigger. Candidate B 0 (s) → h + h − decays (where h ( ) = π, K), used as control channels, are required to be triggered independently of the B 0 (s) decay products. Candidate B 0 (s) → µ + µ − decays are selected by combining two oppositely charged tracks with high quality muon identification [25], transverse momentum p T satisfying 0.25 < p T < 40 GeV/c, and momentum p < 500 GeV/c. The two tracks are required to form a secondary vertex (SV), with χ 2 per degree of freedom less than 9, displaced from any pp interaction vertex (primary vertex, PV) by a flight distance significance greater than 15. The smallest impact parameter χ 2 (χ 2 IP ), defined as the difference between the χ 2 of a PV formed with and without the track in question, is required to be larger than 25 with respect to any PV for the muon candidates. Only B candidates with p T > 0.5 GeV/c, decay time less than 9×τ B 0 s [3], impact parameter significance IP/σ(IP) < 5 with respect to the PV for which the B IP is minimal, and dimuon invariant mass in the range [4900, 6000] MeV/c 2 are selected. The control and normalisation channels are selected with almost identical requirements to those applied to the signal sample. The B 0 (s) → h + h − selection is the same as that of B 0 (s) → µ + µ − , except that muon identification criteria are not applied. The B + → J/ψK + decay is reconstructed from a dimuon pair combined to form the J/ψ → µ + µ − decay and selected in the same way as the B 0 (s) → µ + µ − signal samples, except for the requirements on the impact parameter significance and mass. After a requirement of χ 2 IP > 25, kaon candidates are combined with the J/ψ candidates. These selection criteria are completed by a requirement on the response of a multivariate operator, called MVS in Ref. [26] and unchanged since then, applied to candidates in both signal and normalisation channels. After the trigger and selection requirements are applied, 55 661 signal dimuon candidates are found, which are used for the search.
The main discrimination between the signal and combinatorial background is brought by the BDT, which is optimised using simulated samples of B 0 s → µ + µ − events for the signal and bb → µ + µ − X events for the background. The BDT combines information from the following input variables: the B candidate decay time, IP and p T ; the minimum χ 2 IP of the two muons with respect to any PV; the distance of closest approach between the two muons; and the cosine of the angle between the muon momentum in the dimuon rest frame and the vector perpendicular to both the B candidate momentum and the beam axis. Moreover two different measures for the isolation of signal candidates are also included: the number of good two-track vertices a muon can make with other tracks in the event; and the B candidate isolation, introduced in Ref. [27]. With respect to the multivariate operator used in previous analyses [12,26], the minimum p T of the two muons is no longer used while four new variables are included to improve the separation power. The first two are the absolute values of the differences between the pseudorapidities of the two muon candidates and between their azimuthal angles. The others are the angle of the momentum of the B candidate in the laboratory frame, and the angle of the positive muon from the B candidate in the rest frame of the B candidate, both with respect to the sum of the momenta of tracks, in the rest frame of the B candidate, consistent with originating from the decay of a b hadron produced in association to the signal candidate. In total, 12 variables enter into the BDT.
The variables used in the BDT are chosen so that the dependence on dimuon invariant mass is linear and small to avoid biases. The BDT is constructed to be distributed uniformly in the range [0,1] for signal, and to peak strongly at zero for the background. The BDT response range is divided into eight bins with boundaries 0.0, 0.25, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0.
The expected BDT distributions for the B 0  The expected B 0 s → µ + µ − BDT distribution is shown in Fig. 1.
The invariant mass distribution of the signal decays is described by a Crystal Ball function [28]. The peak values (m B 0 s and m B 0 ) and resolutions (σ B 0 s and σ B 0 ) are obtained from B 0 s → K + K − and B 0 → K + π − , B 0 → π + π − decays, for the B 0 s and B 0 mesons. The resolutions are also determined with a power-law interpolation between the measured resolutions of charmonium and bottomonium resonances decaying into two muons. The two methods are in agreement and the combined results are σ B 0 s = 23.2 ± 0.4 MeV/c 2 and σ B 0 = 22.8 ± 0.4 MeV/c 2 . The transition point of the radiative tail is obtained from simulated B 0 s → µ + µ − events [21] smeared to reproduce the mass resolution measured in data.
The numbers of B 0 s → µ + µ − and B 0 → µ + µ − candidates, N B 0 (s) →µ + µ − , are converted into branching fractions with where N norm is the number of normalisation channel decays obtained from a fit to the relevant invariant mass distribution, and B norm the corresponding branching fraction. The fractions f d(s) and f norm refer to the probability for a b quark to fragment into the corresponding B meson. The value f s /f d = 0.259 ± 0.015, measured by LHCb in pp collision data at √ s = 7 TeV [29,30], is used and f d = f u is assumed. The stability of f s /f u between √ s = 7 TeV and 8 TeV is verified by comparing the ratios of the yields of B 0 s → J/ψφ and B + → J/ψK + decays. The effect of the measured dependence of f s /f d on p T [29] is found to be negligible.
The efficiency sig(norm) for the signal (normalisation) channel is the product of the reconstruction efficiency of the final state particles including the geometric detector acceptance, the selection efficiency and the trigger efficiency. The ratio of acceptance, reconstruction and selection efficiencies of the signal compared to the normalisation channel is computed with samples of simulated events, assuming the SM decay time distribution, corrected to take into account known differences between data and simulation. The tracking and particle identification efficiencies are measured from control channels in data. Residual differences between data and simulation are treated as sources of systematic uncertainty. The trigger efficiency is evaluated with data-driven techniques [24].
Misidentification probabilities for the tracks in these decays are measured directly with control channels in data. Background sources without any misidentification such as B + c → J/ψµ + ν µ [34] and B 0(+) → π 0(+) µ + µ − [35] decays are also considered. The expected yields of all the b-hadron background modes are estimated by normalising to the B + → J/ψK + decay with the exception of B 0 (s) → h + h − , for which the explicit selection yields are used, correcting for the trigger efficiency ratio. No veto is imposed on photons, as the contribution of B 0 s → µ + µ − γ is negligible, as are contributions from B 0 s → µ + µ − ν µνµ decays [36,37]. The expected number of events for each of the backgrounds from b-hadron decays is shown in Table 1. The only one of these contributions that is relevant under the signal mass peaks is from B 0 (s) → h + h − decays.
A simultaneous unbinned maximum-likelihood fit to the data is performed in the mass projections of the BDT bins to determine the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions, which are free parameters. The B 0 s → µ + µ − and B 0 → µ + µ − fractional yields in BDT bins are constrained to the BDT fractions calibrated with the B 0 (s) → h + h − sample. The parameters of the Crystal Ball functions, that describe the mass shapes, and the normalisation factors are restricted by Gaussian constraints according to their expected values and uncertainties. The backgrounds from B 0 (s) → h + h − , B 0 → π − µ + ν µ , B 0 s → K − µ + ν µ and B 0(+) → π 0(+) µ + µ − are included as separate components in the fit. The fractional yields of the b-hadron backgrounds in each BDT bin and their overall yields are limited by Gaussian constraints around the expected values according to their uncertainties. The combinatorial background in each BDT bin is parametrised with an exponential function for which both the slope and the normalisation are allowed to vary freely. The resulting BDT distribution is compared to that expected for the signal in Fig. 1.
An excess of B 0 s → µ + µ − candidates with respect to the expectation from the background only is seen with a significance of 4.0 standard deviations (σ), while the significance of the B 0 → µ + µ − signal is 2.0 σ. These significances are determined from the change in likelihood from fits with and without the signal component. The median significance expected for a SM B 0 s → µ + µ − signal is 5.0 σ. The simultaneous unbinned maximum-likelihood fit results in The statistical uncertainty is derived by repeating the fit after fixing all the fit parameters, except the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions and the slope and normalisation of the combinatorial background, to their expected values. The systematic uncertainty is obtained by subtracting in quadrature the statistical uncertainty from the total uncertainty obtained from the likelihood with all nuisance parameters allowed to vary according to their uncertainties. Additional systematic uncertainties reflect the impact on the result of changes in the parametrisation of the background by including the Λ 0 b → pµ −ν µ component and by varying the mass shapes of backgrounds from b-hadron decays, and are added in quadrature. The correlation between the branching fractions parameters of both decay modes is +3.3 %. The values of the B 0 (s) → µ + µ − branching fractions obtained from the fit are in agreement with the SM expectations. The invariant mass distribution of the B 0 (s) → µ + µ − candidates with BDT > 0.7 is shown in Fig. 2.
As no significant excess of B 0 → µ + µ − events is found, a modified frequentist approach, the CL s method [38] is used, to set an upper limit on the branching fraction. The method provides CL s+b , a measure of the compatibility of the observed distribution with the signal plus background hypothesis, CL b , a measure of the compatibility with the backgroundonly hypothesis, and CL s = CL s+b /CL b . A search region is defined around the B 0 invariant mass as m B 0 ± 60 MeV/c 2 . For each BDT bin the invariant mass signal region is divided into nine bins with boundaries m B 0 ± 18, 30, 36, 48, 60 MeV/c 2 , leading to a total of 72 search bins.
An exponential function is fitted, in each BDT bin, to the invariant mass sidebands. Even though they do not contribute to the signal search window, the b-hadron backgrounds are added as components in the fit to account for their effect on the combinatorial background estimate. The uncertainty on the expected number of combinatorial background events per bin is determined by applying a Poissonian fluctuation to the number of events observed in the sidebands and by varying the exponential slopes according to their uncertainties. In each bin, the expectations for B 0 s → µ + µ − decays assuming the SM branching fraction and for B 0 (s) → h + h − background are accounted for. For each branching fraction hypothesis, the expected number of signal events is estimated from the normalisation factor. Signal events are distributed in bins according to the invariant mass and BDT calibrations.
In each bin, the expected numbers of signal and background events are computed and compared to the number of observed candidates using CL s . The expected and observed upper limits for the B 0 → µ + µ −  channel are summarised in Table 2 and the expected and observed CL s values as functions of the branching fraction are shown in Fig. 3. In summary, a search for the rare decays B 0 s → µ + µ − and B 0 → µ + µ − is performed with pp collision data corresponding to integrated luminosities of 1 fb −1 and 2 fb −1 collected at √ s = 7 TeV and 8 TeV, respectively. The B 0 decay yield is not significant and an improved upper limit of B(B 0 → µ + µ − ) < 7.4 × 10 −10 at 95 % CL is obtained. The B 0 s → µ + µ − signal is seen with a significance of 4.0 σ. The time-integrated branching fraction B(B 0 s → µ + µ − ) is measured to be (2.9 +1.1 −1.0 ) × 10 −9 , in agreement with the SM prediction. These measurements supersede and improve on our previous results, and tighten the constraints on possible new physics contributions to these decays.