First evidence for the decay Bs ->mu+ mu-

A search for the rare decays Bs->mu+mu- and B0->mu+mu- is performed using data collected in 2011 and 2012 with the LHCb experiment at the Large Hadron Collider. The data samples comprise 1.1 fb^-1 of proton-proton collisions at sqrt{s} = 8 TeV and 1.0 fb^-1 at sqrt{s}=7 TeV. We observe an excess of Bs ->mu+ mu- candidates with respect to the background expectation. The probability that the background could produce such an excess or larger is 5.3 x 10^-4 corresponding to a signal significance of 3.5 standard deviations. A maximum-likelihood fit gives a branching fraction of BR(Bs ->mu+ mu-) = (3.2^{+1.5}_{-1.2}) x 10^-9, where the statistical uncertainty is 95% of the total uncertainty. This result is in agreement with the Standard Model expectation. The observed number of B0 ->mu+ mu- candidates is consistent with the background expectation, giving an upper limit of BR(B0 ->mu+ mu-)<9.4 x 10^-10 at 95% confidence level.

A search for the rare decays B 0 s → µ + µ − and B 0 → µ + µ − is performed using data collected in 2011 and 2012 with the LHCb experiment at the Large Hadron Collider. The data samples comprise 1.1 fb −1 of proton-proton collisions at √ s = 8 TeV and 1.0 fb −1 at √ s = 7 TeV. We observe an excess of B 0 s → µ + µ − candidates with respect to the background expectation. The probability that the background could produce such an excess or larger is 5.3×10 −4 corresponding to a signal significance of 3.5 standard deviations. A maximum-likelihood fit gives a branching fraction of B(B 0 s → µ + µ − ) = (3.2 +1. 5 −1.2 )×10 −9 , where the statistical uncertainty is 95 % of the total uncertainty. This result is in agreement with the Standard Model expectation. The observed number of B 0 → µ + µ − candidates is consistent with the background expectation, giving an upper limit of B(B 0 → µ + µ − ) < 9.4 × 10 −10 at 95 % confidence level.
The lowest published limits are B(B 0 s → µ + µ − ) < 4.5 × 10 −9 and B(B 0 → µ + µ − ) < 1.0 × 10 −9 at 95 % confidence level (CL) from the LHCb collaboration using 1.0 fb −1 of data collected in pp collisions in 2011 at √ s = 7 TeV [8]. This Letter reports an update of this search with 1.1 fb −1 of data recorded in 2012 at √ s = 8 TeV. The analysis of 2012 data is similar to that described in Ref. [8] with two main improvements: the use of particle identification to select B 0 (s) → h + h − (with h ( ) = K, π) decays used to calibrate the geometrical and kinematic variables, and a refined estimate of the exclusive backgrounds. To avoid potential bias, the events in the signal region were not examined until all the analysis choices were finalized. The updated estimate of the exclusive backgrounds is also applied to the 2011 data [8] and the results re-evaluated. The results obtained with the combined 2011 and 2012 datasets supersede those of Ref. [8].
Candidate B 0 (s) → µ + µ − events are required to be selected by a hardware and a subsequent software trigger. The candidates are predominantly selected by single and dimuon triggers [17] and, to a smaller extent, by a generic b-hadron trigger [18]. Candidate events in the B + → J/ψK + control channel, with J/ψ → µ + µ − (inclusion of charged conjugated processes is implied throughout this Letter), are selected in a very similar way, the only difference being a different dimuon mass requirement in the final software trigger. The B 0 (s) → h + h − decays are predominantly selected by a hardware trigger based on the calorimeter transverse energy and subsequently by a generic b-hadron software trigger.
The B 0 (s) → µ + µ − candidates are selected by requiring two high quality muon candidates [19] displaced with respect to any pp interaction vertex (primary vertex, PV), and forming a secondary vertex (SV) with a χ 2 per degree of freedom smaller than 9 and separated from the PV in the downstream direction by a flight distance significance greater than 15. Only candidates with an impact parameter χ 2 , IPχ 2 (defined as the difference between the χ 2 of the PV formed with and without the considered tracks) less than 25 are considered. When more than one PV is reconstructed, that giving the smallest IPχ 2 for the B candidate is chosen. Tracks from selected candidates are required to have transverse momentum p T satisfying 0.25 < p T < 40 GeV/c and p < 500 GeV/c. Only B candidates with decay times smaller than 9 τ (B 0 s ) [20] and with invariant mass in the range [4900, 6000] MeV/c 2 are kept.
Dimuon candidates from elastic diphoton production are heavily suppressed by requiring p T (B) > 0.5 GeV/c. The surviving background comprises mainly random combinations of muons from semileptonic decays of two different b hadrons (bb → µ + µ − X, where X is any other set of particles).
Two channels, B + → J/ψK + and B 0 → K + π − , serve as normalization modes. The first mode has trigger and muon identification efficiencies similar to those of the signal, but a different number of tracks in the final state. The second mode has a similar topology, but is triggered differently. The selection of these channels is as close as possible to that of the signal to reduce the impact of potential systematic uncertainties.
The B 0 → K + π − selection is the same as for B 0 (s) → µ + µ − signal except for muon identification. The two tracks are nevertheless required to be within the muon detector acceptance.
The J/ψ → µ + µ − decay in the B + → J/ψK + normalization channel is also selected similarly to the B 0 (s) → µ + µ − signals, except for the requirements on the IPχ 2 and mass. Kaon candidates are required to have IPχ 2 > 25.
A two-stage multivariate selection, based on boosted decision trees [21,22] is applied to the B 0 (s) → µ + µ − candidates. A cut on the first multivariate discriminant, unchanged from Ref. [8], removes 80 % of the background while retaining 92 % of signal. The efficiencies of this cut for the signal and the normalization samples are equal within 0.2 % as determined from simulation.
The output of the second multivariate discriminant, called BDT, and the dimuon invariant mass are used to classify the selected candidates. The nine variables entering the BDT are the B candidate IP, the minimum IPχ 2 of the two muons with respect to any PV, the sum of the degrees of isolation of the muons (the number of good two-track vertices a muon can make with other tracks in the event), the B candidate decay time, p T , and isolation [23], the distance of closest approach between the two muons, the minimum p T of the muons, and the cosine of the angle between the muon momentum in the dimuon rest frame and the vector perpendicular to both the B candidate momentum and the beam axis.
The BDT discriminant is trained using simulated samples consisting of B 0 s → µ + µ − for signal and bb → µ + µ − X for background. The BDT response is defined such that it is approximately uniformly distributed between zero and one for signal events and peaks at zero for the background. The BDT response is independent of the invariant mass for signal inside the search window. The probability for a B 0 (s) → µ + µ − event to have a given BDT value is obtained from data using B 0 → K + π − , π + π − and B 0 s → π + K − , K + K − exclusive decays selected as the signal events and triggered independently of the tracks from B 0 (s) candidates. The invariant mass lineshape of the signal events is described by a Crystal Ball function [24]. The peak values for the B 0 s and B 0 mesons, m B 0 s and m B 0 , are obtained from the B 0 s → K + K − and B 0 → K + π − , B 0 → π + π − samples. The resolutions are determined by combining the results obtained with a power-law interpolation between the measured resolutions of charmonium and bottomonium resonances decaying into two muons with those obtained with a fit of the mass distri- MeV/c 2 and σ B 0 = 24.6 ± 0.4 MeV/c 2 , respectively. The transition point of the radiative tail is obtained from simulated B 0 s → µ + µ − events smeared to reproduce the mass resolution measured in data.
The B 0 s → µ + µ − and B 0 → µ + µ − yields are translated into branching fractions using where B norm represents the branching fraction, N norm the number of signal events in the normalization channel obtained from a fit to the invariant mass distribution, and N B 0 (s) →µ + µ − is the number of observed signal events. The factors f d(s) and f norm indicate the probabilities that a b quark fragments into a B 0 (s) meson and into the hadron involved in the given normalization mode, respectively. We assume f d = f u and use f s /f d = 0.256 ± 0.020 measured in pp collision data at √ s = 7 TeV [25]. This value is in agreement within 1.5 σ with that found at √ s = 8 TeV by comparing the ratios of the yields of [25] is found to be negligible for this analysis.
The efficiency sig(norm) for the signal (normalization channel) is the product of the reconstruction efficiency of the final state particles including the geometric detector acceptance, the selection efficiency and the trigger efficiency. The ratio of acceptance, reconstruction and selection efficiencies is computed using simulation. Potential differences between data and simulation are accounted for as systematic uncertainties. Reweighting techniques are used for all the distributions in the simulation that do not match those from data. The trigger efficiency is evaluated with data-driven techniques [26]. The observed numbers of B + → J/ψK + and B 0 → K + π − candidates in the 2012 dataset are 424 200 ± 1500 and 14 600 ± 1100, respectively. The two normalization factors α norm B 0 (s) →µ + µ − are in agreement within the uncertainties, and their weighted average, taking correlations into account, gives α B 0 s →µ + µ − = (2.52 ± 0.23) × 10 −10 and In total, 24 044 muon pairs with invariant mass between 4900 and 6000 MeV/c 2 pass the trigger and selection requirements. Given the measured normalization factors and assuming the SM branching fractions, the data sample is expected to contain about 14.
The BDT range is divided into eight bins with boundaries The B 0 → π − µ + ν µ and B 0 (s) → h + h − branching fractions are taken from Ref. [20]. The theoretical estimates of the Λ 0 b → pµ − ν µ and B 0 s → K − µ + ν µ branching fractions are taken from Refs. [27] and [28], respectively. The mass and BDT distributions of these modes are evaluated from simulated samples where the K → µ, π → µ and p → µ misidentification probabilities as a function of momentum and transverse momentum are those determined from D * + → D 0 π + , D 0 → K − π + and Λ → pπ − data samples. We use the Λ 0 b fragmentation fraction f Λ 0 b measured by LHCb [29] and account for its p T dependence.
A simultaneous unbinned maximum-likelihood fit to the mass projections in the BDT bins is performed on the mass sidebands to determine the number of expected combinatorial background events in the B 0 and B 0 s signal regions used in the derivation of the branching fraction limit. In this fit the parameters that describe the mass distributions of the exclusive backgrounds, their fractional yields in each BDT bin and their overall yields are limited by Gaussian constraints according to their expected values and uncertainties. The combinatorial background is parameterized with an exponential function with slope and normalization allowed to vary. The systematic uncertainty on the estimated number of combinatorial background events in the signal regions is determined by fluctuating the number of events observed in the sidebands according to a Poisson distribution, and by varying the exponential slope according to its uncer-tainty. The same fit is then performed on the full mass range to determine the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions, which are free parameters of the fit. The B 0 s → µ + µ − and B 0 → µ + µ − fractional yields in BDT bins are constrained to the BDT fractions calibrated with the B 0 (s) → h + h − sample. The parameters of the Crystal Ball functions that describe the mass lineshapes and the normalization factors are restricted by Gaussian constraints according to their expected values and uncertainties.
The compatibility of the observed distribution of events with that expected for a given branching fraction hypothesis is computed using the CL s method [33]. The method provides CL s+b , a measure of the compatibility of the observed distribution with the signal plus background hypothesis, CL b , a measure of the compatibility with the background-only hypothesis, and The invariant mass signal regions are divided into nine bins with boundaries m B 0 (s) ± 18, 30, 36, 48, 60 MeV/c 2 . In each bin of the two-dimensional space formed by the dimuon mass and the BDT output we count the number of observed candidates, and compute the expected number of signal and background events.
The comparison of the distributions of observed events and expected background events in the 2012 dataset results in p-values (1 − CL b ) of 9×10 −4 for the B 0 s → µ + µ − and 0.16 for the B 0 → µ + µ − decay, computed at the branching fraction values corresponding to CL s+b = 0.5. We observe an excess of B 0 s → µ + µ − candidates with respect to background expectation with a significance of 3.3 standard deviations. The simultaneous unbinned maximum-likelihood fit gives B(B 0 s → µ + µ − ) = (5.1 +2. 3 −1.9 (stat) +0.7 −0.4 (syst)) × 10 −9 . The statistical uncertainty reflects the interval corresponding to a change of 0.5 with respect to the maximum of the likelihood after fixing all the fit parameters to their expected values except the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions and the slope and normalization of the combinatorial background. The systematic uncertainty is obtained by subtracting in quadrature the statistical uncertainty from the total uncertainty obtained from the likelihood with all nuisance parameters left to vary according to their uncertainties. An additional systematic uncertainty of 0.16×10 −9 reflects the impact on the result of the change in the parameterization of the combinatorial background from a single to a double exponential, and is added in quadrature.
The expected and measured limits on the B 0 → µ + µ − branching fraction at 90 % and 95 % CL are shown in Table I. The expected limits are computed allowing for the presence of B 0 (s) → µ + µ − events according to the SM branching fractions, including cross-feed between the two modes.
The contribution of the exclusive background compo- This shift is compatible with the systematic uncertainty previously assigned to the background shape [8]. The values of the B 0 s → µ + µ − branching fraction obtained with the 2011 and 2012 datasets are compatible within 1.5 σ.
The 2011 and 2012 results are combined by computing the CL s and performing the maximum-likelihood fit simultaneously to the eight and seven BDT bins of the 2011 and 2012 datasets, respectively. The parameters that are considered 100 % correlated between the two datasets are f s /f d , B(B + → J/ψK + ) and B(B 0 → K + π − ), the transition point of the Crystal Ball function describing the signal mass lineshape, the mass distribution of the B 0 (s) → h + h − background, the BDT and mass distributions of the B 0 → π − µ + ν µ and B 0(+) → π 0(+) µ + µ − backgrounds and the SM predictions of the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions. The distribution of the expected and observed events in bins of BDT in the signal regions obtained from the simultaneous analysis of the 2011 and 2012 datasets, are summarized in Table II.
The expected and observed upper limits for the B 0 → µ + µ − channel obtained from the combined 2011+2012 datasets are summarized in Table I   is and is in agreement with the SM expectation. The invariant mass distribution of the B 0 (s) → µ + µ − candidates with BDT > 0.7 is shown in Fig. 2  at 90 % CL (95 % CL), where the lower and upper limit are the branching fractions evaluated at CL s+b = 0.95 (CL s+b = 0.975) and CL s+b = 0.05 (CL s+b = 0.025), respectively. These results are in good agreement with the lower and upper limits derived from integrating the profile likelihood obtained from the unbinned fit. In summary, a search for the rare decays B 0 s → µ + µ − and B 0 → µ + µ − is performed using 1.0 fb −1 and 1.1 fb −1 of pp collision data collected at √ s = 7 TeV and √ s = 8 TeV, respectively. The data in the B 0 search window are consistent with the background expectation and the world's best upper limit of B(B 0 → µ + µ − ) < 9.4 × 10 −10 at 95 % CL is obtained. The data in the B 0 s search window show an excess of events with respect to the background-only prediction with a statistical significance of 3.5 σ. A fit to the data leads to B(B 0 s → µ + µ − ) = (3.2 +1.5 −1.2 ) × 10 −9 which is in agreement with the SM prediction. This is the first evidence for the decay B 0 s → µ + µ − . put at our disposal by Yandex LLC (Russia), as well as to the communities behind the multiple open source software packages that we depend on.