Strong constraints on the rare decays Bs ->mu+ mu- and B0 ->mu+ mu-

A search for Bs ->mu+ mu- and B0 ->mu+ mu- decays is performed using 1.0 fb^-1 of pp collision data collected at \sqrt{s}=7 TeV with the LHCb experiment at the Large Hadron Collider. For both decays the number of observed events is consistent with expectation from background and Standard Model signal predictions. Upper limits on the branching fractions are determined to be BR(Bs ->mu+ mu-)<4.5 (3.8) x 10^-9 and BR(B0 ->mu+ mu-)<1.0 (0.81) x 10^-9 at 95% (90%) confidence level.

In this Letter, we report an analysis of the pp collision data recorded in 2011 by the LHCb experiment corresponding to an integrated luminosity of 1:0 fb À1 . This data set includes the 0:37 fb À1 used in the previous analysis [6]. In addition to the larger data set, improvements include an updated event selection, an optimized binning in the discriminating variables, and a reduction of the peaking background. The data already analyzed in Ref. [6] were reprocessed and, to avoid any potential bias, all the events in the signal region were blinded until all the analysis choices were finalized.
The LHCb detector [7] is a single-arm forward spectrometer covering the pseudorapidity range 2 < < 5. The detector includes a high precision tracking system consisting of a silicon-strip vertex detector, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream. The combined tracking system has a momentum resolution Áp=p that varies from 0.4% at 5 GeV=c to 0.6% at 100 GeV=c. Two ring-imaging Cherenkov detectors (RICH) are used to identify charged particles. Photon, electron, and hadron candidates are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter, and a hadronic calorimeter. Muons are identified by alternating layers of iron and multiwire proportional chambers.
The trigger consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage (high-level trigger [HLT]) that applies a full event reconstruction. Events with muon final states are triggered using two hardware trigger decisions: the single-muon decision (one muon candidate with transverse momentum p T > 1:5 GeV=c), and the dimuon decision (two muon candidates with p T;1 and p T;2 such that ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p T;1 p T;2 p > 1:3 GeV=c). All tracks in the HLT are required to have a p T > 0:5 GeV=c. The single muon trigger decision in the HLT selects tracks with an impact parameter IP > 0:1 mm and p T > 1:0 GeV=c. The dimuon trigger decision requires þ À pairs with an invariant mass m > 4700 MeV=c 2 . Another trigger decision, designed to select J=c mesons, requires 2970 < m < 3210 MeV=c 2 . Events with purely hadronic final states are triggered by the hardware trigger if there is a calorimeter cluster with transverse energy E T > 3:5 GeV. HLT trigger decisions selecting generic b-hadron decays provide high efficiency for such final states.
The B 0 ðsÞ ! þ À selection requires two high quality muon candidates displaced with respect to any primary pp interation point (primary vertex, PV). The dimuon secondary vertex (SV) is required to be well measured (with a 2 per degree of freedom smaller than 9.0), downstream, and separated from the PV by a distance-of-flight significance greater than 15. When more than one PV is reconstructed, the one giving the minimum IP significance for the B candidate is chosen. Only candidates with IP=ðIPÞ < 5 are kept. Combinations with poorly reconstructed tracks are removed by requiring p < 500 GeV=c and 0:25 < p T < 40 GeV=c for all tracks from the selected candidates.
Only B candidates with decay times smaller than 9 Â ðB 0 s Þ [8] are kept. Finally, according to the simulation, approximately 90% of dimuon candidates coming from elastic diphoton production are removed by requiring a minimum p T of the B candidate of 500 MeV=c. The surviving background mainly comprises random combinations of muons from semileptonic b-hadron decays (b " b ! þ À X, where X is any other set of particles). Three channels, B þ ! J=c K þ , B 0 s ! J=c , and B 0 ! K þ À (inclusion of charged conjugated processes is implied throughout this Letter) serve as normalization modes. The first two have trigger and muon identification efficiencies similar to those of the signal, but a different number of tracks in the final state. The third channel has a similar topology, but is selected by different triggers. The selection of these channels is designed to be as similar as possible to that of the signal to reduce the impact of common systematic uncertainties. An inclusive B 0 ðsÞ ! h þ h 0À sample (where h, h 0 can be a pion or a kaon) is the main control sample. The selection is the same as for B 0 ðsÞ ! þ À signal candidates, except for the muon identification requirement. To ensure similar selection efficiencies for the B 0 ! K þ À and B 0 ðsÞ ! þ À channels, tracks from the B 0 ! K þ À decay are required to be in the muon detector acceptance. The J=c ! þ À decay in the B þ ! J=c K þ and B 0 s ! J=c normalization channels is also selected as B 0 ðsÞ ! þ À signal, except for the requirements on its IP and mass. Kaon candidates are required to be identified by the RICH detectors and to pass IP selection criteria.
A multivariate selection (MVS), based on a boosted decision tree [9], removes 80% of the residual background, while retaining 92% of the signal. Applying this selection improves the performance of the main multivariate algorithm described below. The six variables entering the MVS, ordered by their background rejection power, are: the angle between the direction of the momentum of the B candidate and the direction defined by the vector joining the secondary and the primary vertices, the B candidate IP and its vertex 2 , the minimum IP of the muons with respect to any PV, the minimum distance between the two daughter tracks and the 2 of the SV. The B 0 ðsÞ ! h þ h 0À mass sidebands have been used to check that the distribution of the MVS output is similar for data and simulation. The same selection is applied (using, when necessary, slightly modified variable definitions) to the normalization samples. The efficiencies for the signal and the normalization samples are equal within 0.2% according to the simulation.
In total, 17321 muon pairs with invariant mass between 4900 and 6000 MeV=c 2 pass the trigger and selection requirements. Given the measured b " b cross section [10] and assuming SM rates, this data sample is expected to contain 11.6 B 0 s ! þ À and 1.3 B 0 ! þ À decays. The selected candidates are classified in a binned twodimensional space formed by the dimuon invariant mass and the output of another boosted decision tree (BDT), described in detail below. In the following, we employ BDT to indicate the algorithm or its output, depending on the context. The invariant mass line shape of the signal events is described by a crystal-ball function [11]. The peak values for the B 0 s and B 0 mesons, m B 0 s and m B 0 , are obtained from the B 0 s ! K þ K À and B 0 ! K þ À samples [12]. The resolutions are extracted from data with a power-law interpolation between the measured resolutions of charmonium and bottomonium resonances decaying into two muons. Each resonance is fitted with the sum of two crystal-ball functions with common mean values and resolutions, but different parameters describing the tails. The results of the interpolation at m B 0 s and m B 0 are ðm B 0 s Þ ¼ 24:8 AE 0:8 MeV=c 2 and ðm B 0 Þ ¼ 24:3 AE 0:7 MeV=c 2 . They are in agreement with those found using B 0 ! K þ À and B 0 s ! K þ K À exclusive decays. The transition point of the radiative tail is obtained from simulated B 0 s ! þ À events reweighted to reproduce the mass resolution measured in data.
Geometrical and kinematic information not fully exploited in the selection is combined via the BDT for which nine variables are employed [6]. Ordered by their background rejection power, they are: the B candidate IP, the minimum IP significance, the sum of the degrees of isolation of the muons (the number of good two-track vertices a muon can make with other tracks in the event), the B candidate decay time, p T , and degree of isolation [13], the distance of closest approach between the two muons, the minimum p T of the muons, and the cosine of the angle between the muon momentum in the dimuon rest frame and the vector perpendicular to the B candidate momentum and to the beam axis. No data were used for the choice of the variables and the subsequent training of the BDT, to avoid biasing the results. Instead the BDT was trained using simulated samples (B 0 ðsÞ ! þ À for signal and b " b ! þ À X for background). The BDT output is independent of the invariant mass for signal inside the search window. It is defined such that for the signal it is approximately uniformly distributed between zero and one, while for the background it peaks at zero.
The probability for a signal event to have a given BDT value is obtained from data using an inclusive B 0 ðsÞ ! h þ h 0À sample. Only events triggered independently of the presence of any track from the signal candidates are considered. The number of B 0 ðsÞ ! h þ h 0À signal events in each BDT bin is determined by fitting the hh 0 invariant mass distribution. The maximum spread in the fractions of the yields going into each bin, obtained by fitting the same data set with different signal and background models, is used to evaluate the systematic uncertainty on the signal BDT probability distribution function [6].
The binning of the BDT and invariant mass distributions is reoptimized with respect to Ref. [6], using simulation, to PRL 108, 231801 (2012) P H Y S I C A L R E V I E W L E T T E R S week ending 8 JUNE 2012 maximize the separation between the median of the test statistic distribution expected for background and SM B 0 s ! þ À signal and that expected for background only. The chosen number and size of the bins are a compromise between maximizing the number of bins and the necessity to have enough B 0 ðsÞ ! h þ h 0À events to calibrate the B 0 s ! þ À BDT and enough background in the mass sidebands (see below) in each bin to estimate the combinatorial background in the B 0 s and B 0 mass regions. The BDT range is thus divided into eight bins (see Table I) and the invariant mass range into nine bins with boundaries are defined by m B 0 ðsÞ AE 18; 30; 36; 48; 60 MeV=c 2 . This binning improves the test statistic separation by about 14% at the SM rate with respect to Ref. [6]; over 97% of this separation comes from the bins with BDT > 0:5.
We select events in the invariant mass range [4900 MeV=c 2 , 6000 MeV=c 2 ]. The boundaries of the signal regions are defined as m B 0 ðsÞ AE 60 MeV=c 2 . The low-mass sideband is potentially polluted by cascading b ! c ! X decays below 4900 MeV=c 2 and peaking background from B 0 ðsÞ ! h þ h 0À candidates with the two hadrons misidentified as muons above 5000 MeV=c 2 . The number of expected combinatorial background events in each BDT and invariant mass bin inside the signal regions is determined from data by fitting to an exponential function events in the mass sidebands defined by ½4900 MeV=c 2 ; 5000 MeV=c 2 and [m B 0 s þ 60 MeV=c 2 , 6000 MeV=c 2 ]. The systematic uncertainty on the estimated number of combinatorial background events is computed by fluctuating with a Poissonian distribution the number of events measured in the sidebands, and by varying within AE1 the value of the exponent. As a cross-check, another model, the sum of two exponential functions, has been used to fit the events in different ranges of sidebands providing consistent background estimates inside the signal regions. An additional systematic uncertainty is introduced where the yields in the signal regions differ by more than 1 between the fit models.
Peaking backgrounds from B 0 ðsÞ ! h þ h 0À events have been evaluated by folding the K ! and ! misidentification rates extracted from a D 0 ! K À þ sample from data in bins of p and p T into the spectrum of selected simulated B 0 ðsÞ ! h þ h 0À events. The mass line shape of the peaking background is obtained from a simulated sample of doubly misidentified B 0 ðsÞ ! h þ h 0À events. In total, 0:5 þ0:2 À0:1 (2:6 þ1:1 À0:4 ) doubly misidentified B 0 ðsÞ ! h þ h 0À events are expected in the B 0 s (B 0 ) signal mass windows. The contributions of B þ c ! J=c ð þ À Þ þ and B 0 s ! þ À exclusive decays have been found to be negligible with respect to the combinatorial and B 0 ðsÞ ! h þ h 0À backgrounds.
The B 0 s ! þ À and B 0 ! þ À yields are translated into branching fractions using where f dðsÞ and f norm are the probabilities that a b quark fragments into a B 0 ðsÞ and into the hadron involved in the given normalization mode, respectively. We use f s =f d ¼ 0:267 þ0:021 À0:020 [14] and we assume f d ¼ f u . With B norm we indicate the branching fraction and with N norm the number of signal events in the normalization channel obtained from a fit to the invariant mass distribution. The efficiency sigðnormÞ for the signal (normalization channel) is the product of the reconstruction efficiency of all the final state particles of the decay including the geometric acceptance of the detector, the selection efficiency for reconstructed events, and the trigger efficiency for reconstructed and selected events. The ratio of acceptance and reconstruction efficiencies are computed using the Monte Carlo simulation. The differences between the simulation and data are included as systematic uncertainties. The selection efficiencies are determined using Monte Carlo simulation and cross-checked with data. Reweighting techniques have been used for all the Monte Carlo distributions that do not match those from data. The trigger efficiency is evaluated with data driven techniques. Finally, N B 0 ðsÞ ! þ À is the number of observed signal events. The observed numbers of B þ ! J=c K þ , B 0 s ! J=c and B 0 ! K þ À candidates are 340 100 AE 4500, 19 040 AE 160 and 10 120 AE 920, respectively. The three normalization factors are in agreement within the uncertainties and their weighted average, taking correlations into account, gives norm B 0 s ! þ À ¼ ð3:19 AE 0:28Þ Â 10 À10 and norm B 0 ! þ À ¼ ð8:38 AE 0:39Þ Â 10 À11 .
For each bin in the two-dimensional space formed by the invariant mass and the BDT, we count the number of candidates observed in the data, and compute the expected number of signal and background events.
The systematic uncertainties in the background and signal predictions in each bin are computed by fluctuating the mass and BDT shapes and the normalization factors along the Gaussian distributions defined by their associated uncertainties. The inclusion of the systematic uncertainties increases the B 0 ! þ À and B 0 s ! þ À upper limits by less than $5%.
The results for B 0 s ! þ À and B 0 ! þ À decays, integrated over all mass bins in the corresponding signal region, are summarized in Table I. The distribution of the invariant mass for BDT > 0:5 is shown in Fig. 1 for B 0 s ! þ À and B 0 ! þ À candidates. The compatibility of the observed distribution of events with that expected for a given branching fraction hypothesis is computed using the CL s method [15]. The method provides CL sþb , a measure of the compatibility of the observed distribution with the signal plus background hypothesis, CL b , a measure of the compatibility with the background-only hypothesis, and CL s ¼ CL sþb =CL b .
The expected and observed CL s values are shown in Fig. 2 for the B 0 s ! þ À and B 0 ! þ À channels, each as a function of the assumed branching fraction. The expected and measured limits for B 0 s ! þ À and B 0 ! þ À at 90% and 95% C.L. are shown in Table II. The expected limits are computed allowing the presence of B 0 ðsÞ ! þ À events according to the SM branching fractions, including cross feed between the two modes.
The comparison of the distributions of observed events and expected background events results in a p-value (1 À C:L: b ) of 18% (60%) for the B 0 s ! þ À (B 0 ! þ À ) decay, where the C:L: b values are those corresponding to C:L: sþb ¼ 0:5.
A simultaneous unbinned likelihood fit to the mass projections in the eight BDT bins has been performed to determine the B 0 s ! þ À branching fraction. The signal fractional yields in BDT bins are constrained to the BDT FIG. 1 (color online). Distribution of selected candidates (black points) in the (left) B 0 s ! þ À and (right) B 0 ! þ À mass window for BDT > 0:5, and expectations for, from the top, B 0 ðsÞ ! þ À SM signal (gray), combinatorial background (light gray), B 0 ðsÞ ! h þ h 0À background (black), and cross feed of the two modes (dark gray). The hatched area depicts the uncertainty on the sum of the expected contributions.  fractions calibrated with the B 0 ðsÞ ! h þ h 0À sample. The fit gives BðB 0 s ! þ À Þ ¼ ð0:8 þ1:8 À1:3 Þ Â 10 À9 , where the central value is extracted from the maximum of the logarithm of the profile likelihood and the uncertainty reflects the interval corresponding to a change of 0.5. Taking the result of the fit as a posterior, with a positive branching fraction as a flat prior, the probability for a measured value to fall between zero and the SM expectation is 82%, according to the simulation. The one-sided 90%, 95% C.L., and the compatibility with the SM predictions obtained from the likelihood, are in agreement with the C:L: s results. The results of a fully unbinned likelihood fit method are in agreement within uncorrelated systematic uncertainties. The largest systematic uncertainty is due to the parametrization of the combinatorial background BDT.
In summary, a search for the rare decays B 0 s ! þ À and B 0 ! þ À has been performed on a data sample corresponding to an integrated luminosity of 1:0 fb À1 . These results supersede those of our previous publication [6] and are statistically independent of those obtained from data collected in 2010 [12]. The data are consistent with both the background-only hypothesis and the combined background plus SM signal expectation at the 1 level. For these modes, we set the most stringent upper limits to date: BðB 0 s ! þ À Þ < 4:5 Â 10 À9 and BðB 0 ! þ À Þ < 1:03 Â 10 À9 at 95% C.L. We express our gratitude to our colleagues in the CERN accelerator departments for the excellent performance of the LHC. We thank the technical and administrative staff at CERN and at the LHCb institutes, and acknowledge support from the national agencies: CAPES, CNPq, FAPERJ, and FINEP (Brazil); CERN; NSFC (China); CNRS/IN2P3 (France); BMBF, DFG, HGF, and MPG (Germany); SFI (Ireland); INFN (Italy); FOM and NWO (The Netherlands); SCSR (Poland); ANCS (Romania); MinES of Russia and Rosatom (Russia); MICINN, XuntaGal, and GENCAT (Spain); SNSF and SER (Switzerland); NAS Ukraine (Ukraine); STFC (United Kingdom); NSF (USA). We also acknowledge the support received from the ERC under FP7 and the Region Auvergne.