Analysis of neutral $B$-meson decays into two muons

Branching fraction and effective lifetime measurements of the rare decay $B^0_s\to\mu^+\mu^-$ and searches for the decays $B^0\to\mu^+\mu^-$ and $B^0_s\to\mu^+\mu^-\gamma$ are reported using proton-proton collision data collected with the LHCb detector at centre-of-mass energies of $7$ TeV, $8$ TeV and $13$ TeV, corresponding to a luminosity of $9$ fb$^{-1}$. The branching fraction ${\mathcal{B}}(B^0_s\to\mu^+\mu^-)=\left(3.09^{+0.46+0.15}_{-0.43-0.11}\right)\times 10^{-9}$ and the effective lifetime $\tau(B^0_s\to\mu^+\mu^-)=(2.07\pm 0.29\pm 0.03)$ are measured, where the first uncertainty is statistical and the second systematic. No significant signal for $B^0\to\mu^+\mu^-$ and $B^0_s\to\mu^+\mu^-\gamma$ decays is found and upper limits $\mathcal{B}(B^0\to\mu^+\mu^-)<2.6\times 10^{-10}$ and $\mathcal{B}(B^0_s\to\mu^+\mu^-\gamma)<2.0\times 10^{-9}$ at the 95% CL are determined, where the latter is limited to the range $m_{\mu\mu}>4.9$ GeV$/c^2$. The results are in agreement with the Standard Model expectations.

The B 0 → µ + µ − and B 0 s → µ + µ − decays can be accompanied by the emission of final-state radiation (FSR) from the muons or initial-state radiation (ISR) from the valence quarks, with negligible interference between the two processes [14][15][16].Photons from FSR are predominantly soft and their contribution is included experimentally in the reconstructed B 0 (s) mass shape as a radiative tail.On the contrary, the ISR process, indicated as B 0 s → µ + µ − γ in this Letter, is characterised by a larger momentum of the photon.This contribution, searched for in the present analysis for the first time, has a SM branching fraction of the order of 10 −10 for a dimuon mass above the lower bound of the search window, 4.9 GeV/c 2 , and can be affected by new physics contributions in a different way than the B 0 s → µ + µ − decay [14,15,[17][18][19][20][21][22].Throughout this Letter, B 0 (s) → µ + µ − candidates include B 0 s → µ + µ − , B 0 → µ + µ − or B 0 s → µ + µ − γ decays with the dimuon pair selected in the mass range [4900,6000] MeV/c 2 and the photon not reconstructed [16].The contribution from B 0 → µ + µ − γ decays is considered negligible compared to that from B 0 s → µ + µ − γ because of the additional CKM suppression and the mass shift to lower values.The B 0 s mass eigenstates are characterised by a sizeable difference in their decay widths (∆Γ s ) compared to their average value (1/τ Bs ), such that y s ≡ τ Bs ∆Γ s /2 = 0.065 ± 0.003 [23].The effective lifetime, defined as the average decay time, is given by [24] , where A µµ ∆Γs = 1 (−1) if only the heavy (light) B 0 s eigenstate can decay to the µ + µ − final state.In the SM A µµ ∆Γs = 1, but any value in the range [−1, 1] may be possible in new physics scenarios.As a consequence, the effective lifetime of B 0 s → µ + µ − decays can probe new physics in a way complementary to the branching fraction [25].
This Letter reports improved measurements of the B 0 s → µ + µ − and B 0 → µ + µ − time-integrated branching fractions and of the B 0 s → µ + µ − effective lifetime, which supersede the previous LHCb results [10], and a first search for B 0 s → µ + µ − γ decays.
A more comprehensive description of these measurements is reported in a companion article [26].Inclusion of charge-conjugated processes is implied throughout the Letter.
Results are based on data collected with the LHCb detector in the years 2011-2012 and 2015-2018, corresponding to an integrated luminosity of 1 fb −1 of proton-proton (pp) collisions at a centre-of-mass energy √ s = 7 TeV, 2 fb −1 at √ s = 8 TeV and 6 fb −1 recorded at √ s = 13 TeV.The first two data sets are referred to as Run 1 and the latter as Run 2.
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs.[27,28].The simulated events used in this analysis are produced with the software described in Refs.[29][30][31][32][33] taking into account the variations of the accelerator and detector conditions over time.In particular, FSR is simulated using Photos [34].ISR B 0 s → µ + µ − γ decays are simulated according to the study in Ref. [14].The analysis strategy is similar to that employed in Ref. [10], optimised to enhance the sensitivity to both B 0 s and B 0 decays to µ + µ − .After loose trigger and selection requirements, B 0 (s) → µ + µ − candidates are classified based on the dimuon mass and the output variable, BDT, of a boosted decision tree classifier [35,36] designed to distinguish signal from combinatorial background.To avoid the experimenter's bias, the candidates in the region [5200, 5445] MeV/c 2 , where the B 0 s → µ + µ − and B 0 → µ + µ − signal processes peak, were not examined until the full procedure had been finalised.The signal yields are determined from a maximum-likelihood fit to the dimuon mass distribution of the candidates in regions of BDT, and are converted into branching fractions using the decays B 0 → K + π − and B + → J/ψK + , with J/ψ → µ + µ − , as normalisation modes.These decays have been chosen for their relatively large and well-measured branching fractions and because they share the same topology or a dimuon pair in the final state with the signal.The B 0 s → µ + µ − effective lifetime is measured from the background-subtracted decay-time distribution of signal candidates.
Events are selected by a hardware trigger followed by a software trigger [37].The B 0 (s) → µ + µ − candidates are predominantly selected by single-muon and dimuon triggers.The B + → J/ψK + candidates are selected in the same way except for a different dimuon mass requirement in the software trigger.Candidate B 0 (s) → h + h − decays, with h ( ) = π or K, are used as control and normalisation channels and are triggered independently of the B 0 (s) decay products to avoid selection biases.The B 0 (s) → µ + µ − candidates are reconstructed by combining two oppositely charged tracks with transverse momentum with respect to the proton beam direction, p T , in the range 0.25 < p T < 40 GeV/c, momentum p < 500 GeV/c, and high-quality muon identification [38].The muon candidates are required to be inconsistent with originating from any primary pp interaction vertex (PV) and to form a good quality secondary vertex well displaced from any PV.In the selection, B 0 (s) candidates must have a decay time less than 13.25 ps, p T > 0.5 GeV/c and they must be consistent with originating from at least one PV.A B 0 (s) candidate is rejected if either of the two candidate muons combined with any other oppositely charged muon candidate in the event has a mass consistent with the J/ψ mass [39].The B 0 (s) → h + h − selection is the same as that of B 0 (s) → µ + µ − , except that the muon identification criteria are replaced with hadron identification requirements and the J/ψ veto is not applied.The B + → J/ψK + decay is reconstructed by combining a muon pair, consistent with a J/ψ from a detached vertex, and a kaon candidate inconsistent with originating from any PV in the event.The selection criteria for signal and normalisation candidates include a loose requirement on the response of a different multivariate classifier, described in Refs.[26,40].
The selected events are dominated by combinatorial background, mainly composed of muons originating from two semileptonic b-hadron decays.The separation between signal and combinatorial background is achieved by means of the BDT classifier, which is optimised using simulated samples of B 0 s → µ + µ − events for signal and of b b → µ + µ − X events for background.The classifier combines information from the following input variables: ∆φ 2 + ∆η 2 , where ∆φ and ∆η are the azimuthal angle and pseudorapidity differences between the two muon candidates; the minimum χ 2 IP of the two muons candidates with respect to the PV B , where PV B is the PV most compatible with the B 0 (s) candidate trajectory and χ 2 IP is defined as the difference between the vertex-fit χ 2 of the PV formed with and without the particle in question; the angle between the direction of the B 0 (s) candidate momentum and the vector joining the B 0 (s) decay vertex and PV B ; the B 0 (s) candidate vertex-fit χ 2 and impact parameter significance with respect to the PV B ; and two isolation variables that quantify how much the other tracks of the event are likely to originate from the same hadron decay as the signal tracks.The BDT variable is constructed to be approximately uniform in the range [0,1] for signal, and to peak strongly at zero for background.Its linear correlation with the dimuon mass is below 5%.The Run 1 and Run 2 data sets are each divided into six subsets based on BDT regions with boundaries 0.0, 0.25, 0.4, 0.5, 0.6, 0.7 and 1.0; candidates having BDT < 0.25 are not included in the fit to the dimuon mass distribution.The mass distribution of the B 0 (s) → µ + µ − candidates with BDT > 0.5 is shown in Fig. 1.
The BDT distributions of B 0 (s) → µ + µ − decays are calibrated using simulated samples which have been reweighted to improve the agreement with the data.The p T , η and χ 2 IP quantities of simulated B 0 and B 0 s samples are corrected [41] using data samples of B + → J/ψK + and B 0 s → J/ψφ decays, respectively.The event occupancy is also corrected, separately for each BDT region, by comparing the fraction of B + → J/ψK + candidates in four intervals of the number of tracks in simulated events and in data.To align the reconstruction with that of the B 0 s → µ + µ − signal, the BDT response for the B + → J/ψK + candidates is evaluated using the information from the final state muons and the B + candidate, with two exceptions: the B vertex-fit χ 2 is replaced with that of the J/ψ, and the muon isolation variables are computed without considering the final-state kaon.The effect of the trigger selection on the BDT distribution is estimated using control channels in data.The resulting B 0 → µ + µ − and B 0 s → µ + µ − BDT variable distributions are found to be compatible with that of B 0 → K + π − decays selected in data when corrected for the different trigger and particle identification selection and, in the case of B 0 s → µ + µ − , the different lifetime.
The mass distributions of the B 0 s → µ + µ − and B 0 → µ + µ − signals are described by two-sided Crystal Ball functions [42] with core Gaussian parameters calibrated from the mass distributions of B 0 s → K + K − and B 0 → K + π − data samples, respectively.A mass resolution of about 22 MeV/c 2 is determined by interpolating the measured resolutions of charmonium and bottomonium resonances decaying into two muons.The radiative tails are obtained from simulation [43].Small differences in the resolution and tail parameters of the mass shape for the different BDT regions are taken into account.The mass distribution of the B 0 s → µ + µ − γ decays is described with a threshold function modelled on simulated events that were generated using the theoretical predictions of Refs.[14,15], convoluted with the experimental resolution.
The signal branching fractions are determined using the relation where →µ + µ − is the signal yield determined in the mass fit, N norm is the number of selected normalisation decays (B + → J/ψK + or B 0 → K + π − ), B norm the corresponding branching fraction [44], and sig ( norm ) is the total efficiency for the signal (normalisation) channel.For each signal mode, the two single event sensitivities, α norm (s) meson.The value of f s /f d has been measured by LHCb to be 0.254 ± 0.008 in pp collision data at √ s = 13 TeV, while the average value in Run 1 is lower by a factor of 1.064 ± 0.007 [45].The fragmentation probabilities for the B 0 and B + are assumed to be equal, hence f norm = f d for both normalisation modes.
The acceptance, reconstruction and selection efficiencies are computed with samples of simulated events generated with the decay-time distribution predicted by the SM.The tracking and particle identification efficiencies are determined using control channels in data [46,47].The trigger efficiencies are evaluated with control channels in data [48].
The yields of selected B + → J/ψK + and B 0 → K + π − decays are (4733 ± 3) × 10 The combinatorial background is distributed exponentially over the whole mass range.In addition, the B 0 and B 0 s signal regions and the low-mass sideband are populated by background from specific b-hadron decays divided into two categories: those with the misidentification of at least one hadron as a muon and those where two real muons are present and the decay is partially reconstructed.The first category includes and Λ 0 b → pµ − ν µ decays, of which branching fractions are taken from Refs.[44,49,50].The mass and BDT distributions of these decays are determined from simulated samples after calibrating the K → µ, π → µ and p → µ momentum-dependent misidentification probabilities using control channels in data.An independent estimate of the B 0 (s) → h + h − background yield is obtained by extracting the yields of misidentified B 0 (s) → h + h − decays from the mass spectrum of π + µ − or K + µ − combinations in data, and rescaling the observed yields according to the misidentification probabilities.The difference with respect to the result from the first method is assigned as a systematic uncertainty.The second category of background in the low-mass sideband includes the decays B + c → J/ψµ + ν µ , with J/ψ → µ + µ − , and B 0(+) → π 0(+) µ + µ − , which have at least two muons in the final state.The rate of B + c → J/ψµ + ν µ decays is evaluated from Refs.[51,52] and those of B 0(+) → π 0(+) µ + µ − decays from Refs.[53,54].The expected yields of the background contributions originating from specific processes are estimated by normalising to the B + → J/ψK + decay, except for the B 0 (s) → h + h − decays, which are normalised to the B 0 → K + π − channel.Their expected yields with BDT > 0.25 in the full mass range are 37 branching fractions are determined with a simultaneous unbinned maximum-likelihood fit [55] to the dimuon mass distribution in the BDT regions of the Run 1 and Run 2 data sets, with BDT > 0.25.The fractions of B 0 (s) → µ + µ − yield in each BDT region and the parameters of the Crystal Ball functions [42] describing the shapes of the mass distribution are Gaussian constrained according to their expected values and uncertainties.The combinatorial background in each BDT region is described by an exponential function with the yield and slope allowed to vary freely, but the slope parameter is common to all regions within a given data set.Each other background is included as a separate component in the fit.Their yields as well as the fractions in each BDT region are Gaussian-constrained according to their expected values, while their mass shapes are determined from simulation and fixed in the fit, separately in each BDT region.Figure 1 shows the fit results projected on the dimuon mass distribution for BDT > 0.5.
The branching fractions of the B 0 s → µ + µ − , B 0 → µ + µ − and B 0 s → µ + µ − γ decays obtained from the fit are The statistical uncertainty is obtained by re-running the fit with all nuisance parameters fixed to the values found in the default fit.The systematic uncertainties of B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) are dominated by the uncertainty on f s /f d (3%) and the knowledge of the background from specific processes (9%), respectively.The correlation between the B 0 → µ + µ − and B 0 s → µ + µ − branching fractions is −11% while that between the B 0 s → µ + µ − γ and B 0 → µ + µ − (B 0 s → µ + µ − ) branching fractions is −25% (9%).Two-dimensional profile likelihoods are evaluated by taking the ratio of the likelihood value of a fit where the parameters of interest are fixed and the likelihood value of the standard fit.They are shown in Figure 2 for the possible combinations of two branching fractions.
An excess of B 0 s → µ + µ − decays with respect to the expectation from background is observed with a significance of about 10 standard deviations (σ), while the significance of the B 0 → µ + µ − signal is 1.7σ, as determined using Wilks' theorem [56] from the difference in likelihood between fits with and without the specific signal component.The negative fluctuation of the B 0 s → µ + µ − γ signal has a 1.6σ significance.Since the B 0 → µ + µ − and B 0 s → µ + µ − γ signals are not significant, an upper limit on each branching fraction is set using the CL s method [57] with a profile likelihood ratio as a one-sided test statistic [58].The likelihoods are computed with the nuisance parameters Gaussian-constrained to their fit values.The test statistic is then evaluated on an ensemble of pseudoexperiments where the nuisance parameters are floated according to their uncertainties.The resulting upper limit on B(B 0 → µ + µ − ) is 2.6 × 10 −10 at 95% CL, obtained without constraining the B 0 s → µ + µ − γ yield.Similarly, the upper limit on B(B 0 s → µ + µ − γ) with m µµ > 4.9 GeV/c 2 is evaluated to be 2.0 × 10 −9 at 95% CL.Fixing the B 0 s → µ + µ − γ signal to zero, the B 0 s → µ + µ − branching fraction increases by about 2% and the upper limit on B(B 0 → µ + µ − ) decreases by about 10%.
The selection efficiency of B 0 s → µ + µ − decays depends on the lifetime, introducing a model dependence in the measured time-integrated branching fraction.In the fit the SM value for τ µ + µ − , 1.620 ± 0.007 ps [44], is assumed, corresponding to A µµ ∆Γs = 1.The model dependence is evaluated by repeating the fit under the assumptions A µµ ∆Γs = 0 and −1, finding an increase of the branching fraction with respect to the SM hypothesis of 4.7% and 10.9%, respectively.The dependence is approximately linear in the physically allowed A µµ ∆Γs range.A similar dependence is present for the B 0 s → µ + µ − γ decay with a negligible impact on the branching fraction limit.
The criteria used to select data for the B 0 s → µ + µ − lifetime measurement differ slightly from those used in the branching fraction measurement.As shown in Fig. 1, the contribution from the misidentified background is negligible under the peak, and therefore a narrower dimuon mass range of [5320, 6000] MeV/c 2 is selected, while particleidentification requirements are relaxed slightly due to the lower expected contamination from the misidentified background in the B 0 s → µ + µ − signal region, with a corresponding increase in signal efficiency.Finally, candidate B 0 s → µ + µ − decays are required to fall into two trigger categories: the trigger requirements must be satisfied entirely either by the B 0 s → µ + µ − candidates themselves, or by objects from the pp collision that do not branching fraction is limited to the range m µµ > 4.9 GeV/c 2 .The measured central values of the branching fractions are indicated with a blue dot.The profile likelihood contours for 68%, 95% and 99% CL regions of the result are shown as blue contours, while in the top plot the brown contours indicate the previous measurement [10] and the red cross shows the SM prediction.
form part of the B 0 s → µ + µ − candidate.These more restrictive trigger requirements are imposed in order to improve the modelling of the decay-time dependence of the trigger efficiency in simulation.
In order to determine the B 0 s → µ + µ − effective lifetime the data are divided into two BDT regions [0.35, 0.55] and [0.55, 1.00], with boundaries optimised to achieve the best precision.Fits are performed to the dimuon mass distribution in each BDT region in order to extract background-subtracted decay time distributions using the sPlot technique [59].
The mass fits used in the background subtraction include B 0 s → µ + µ − and combinatorial background components, where the signal is modelled with the same function as in the branching fraction analysis and the background with exponential functions, with freelyfloating slope parameters in each BDT region.The correlation between the reconstructed mass and the reconstructed decay time of the selected candidates is consistent with zero in both data and simulation, as required by the sPlot technique.
A simultaneous fit is then performed to the two background-subtracted decay-time distributions, where each distribution is modelled by a single exponential multiplied by an acceptance function that models the decay time dependence of the reconstruction and selection efficiency.The acceptance functions are determined in each BDT region by fitting parametric functions to the efficiency distributions of simulated B 0 s → µ + µ − decays that have been weighted in order to improve the agreement with the data.The correction for the acceptance is validated by measuring the lifetimes of B 0 → K + π − and B 0 s → K + K − decays in data.The resulting values are 1.510 ± 0.015 ps and 1.435 ± 0.026 ps, respectively, where uncertainties are statistical only.These are consistent with the world averages [44].The statistical uncertainty on the measured B 0 s → K + K − lifetime is taken as the systematic uncertainty associated with the use of simulated events to determine the B 0 s → µ + µ − acceptance function.A number of sources of systematic bias are evaluated using a large number of simulated pseudoexperiments.The fit procedure is found to produce an unbiased estimate of the lifetime with uncertainties that provide the correct coverage.The effect of the contamination from B 0 → µ + µ − , B → h + h − and semileptonic b-hadron decays in the mass fit is found to introduce a small bias of up to 0.012 ps.The effect of the acceptance on the relative admixture of light and heavy mass eigenstates in the decay-time distribution is found to be negligible.Likewise, the uncertainty in the decay-time distribution of the combinatorial background, the production asymmetry between B 0 s and B 0 s mesons and the mismodelling of the acceptance function in simulation is found to have a small effect on the final result.Together, these sources result in a systematic uncertainty of 0.031 ps, which is dominated by the uncertainty on the measured B 0 s → K + K − lifetime.The mass distributions of the selected B 0 s → µ + µ − candidates are shown in Fig. 3 (top) for the two BDT regions.Figure 3 (bottom) shows the corresponding background-subtracted B 0 s → µ + µ − decay-time distribution with the fit function superimposed [55].The effective lifetime is found to be 2.07 ± 0.29 ± 0.03 ps, where the first uncertainty is statistical and the second systematic.This value lies outside the range between the lifetimes of the light (A ∆Γ = −1) and heavy (A ∆Γ = +1) mass eigenstates, which are τ L = 1.423 ± 0.005 ps and τ H = 1.620 ± 0.007 ps [44], but is consistent with these values at 2.2 and 1.5 standard deviations, respectively.
In summary, a new measurement of the rare decay B 0 s → µ + µ − and a search for B 0 → µ + µ − and B 0 s → µ + µ − γ decays has been performed using the full dataset collected by the LHCb experiment during Run 1 and Run 2, corresponding to a total integrated luminosity of 9 fb −1 .The time-integrated branching fraction of B 0 s → µ + µ − is measured to be 3.09 + 0.46 + 0.  the latter is limited to the range m µµ > 4.9 GeV/c 2 .The results are in agreement with the SM predictions and can be used to further constrain possible new physics contributions to these observables.

Figure 2 :
Figure 2: Two-dimensional profile likelihood of the branching fractions for the decays (top)B 0 s → µ + µ − and B 0 → µ + µ − , (bottom left) B 0 → µ + µ − and B 0 s → µ + µ − γ and (bottom right) B 0 s → µ + µ − and B 0 s → µ + µ − γ.The B 0 s → µ + µ − γbranching fraction is limited to the range m µµ > 4.9 GeV/c 2 .The measured central values of the branching fractions are indicated with a blue dot.The profile likelihood contours for 68%, 95% and 99% CL regions of the result are shown as blue contours, while in the top plot the brown contours indicate the previous measurement[10] and the red cross shows the SM prediction.

Figure 3 :
Figure 3: Top: dimuon mass distributions with the fit models used to perform the background subtraction superimposed.Bottom: the background-subtracted decay-time distributions with the fit model used to determine the B 0 s → µ + µ − effective lifetime superimposed.The distributions in the low and high BDT regions are shown in the left and right columns, respectively.
and ERC (European Union); A*MIDEX, ANR, IPhU and Labex P2IO, and Région Auvergne-Rhône-Alpes (France); Key Research Program of Frontier Sciences of CAS, CAS PIFI, CAS CCEPP, Fundamental Research Funds for the Central Universities, and Sci.& Tech.Program of Guangzhou (China); RFBR, RSF and Yandex LLC (Russia); GVA, XuntaGal and GENCAT (Spain); the Leverhulme Trust, the Royal Society and UKRI (United Kingdom).Università di Bologna, Bologna, Italy e Università di Cagliari, Cagliari, Italy f Università di Ferrara, Ferrara, Italy g Università di Firenze, Firenze, Italy h Università di Genova, Genova, Italy i Università degli Studi di Milano, Milano, Italy j Università di Milano Bicocca, Milano, Italy k Università di Modena e Reggio Emilia, Modena, Italy l Università di Padova, Padova, Italy m Scuola Normale Superiore, Pisa, Italy n Università di Pisa, Pisa, Italy o Università della Basilicata, Potenza, Italy p Università di Roma Tor Vergata, Roma, Italy q Università di Siena, Siena, Italy r Università di Urbino, Urbino, Italy s MSU -Iligan Institute of Technology (MSU-IIT), Iligan, Philippines t AGH -University of Science and Technology, Faculty of Computer Science, Electronics and Telecommunications, Kraków, Poland u P.N.Lebedev Physical Institute, Russian Academy of Science (LPI RAS), Moscow, Russia v Novosibirsk State University, Novosibirsk, Russia w Department of Physics and Astronomy, Uppsala University, Uppsala, Sweden x Hanoi University of Science, Hanoi, Vietnam d