Search for the lepton-flavour-violating decays $B^{0}_{s}\to\tau^{\pm}\mu^{\mp}$ and $B^{0}\to\tau^{\pm}\mu^{\mp}$

A search for $B^{0}_{s}\to\tau^{\pm}\mu^{\mp}$ and $B^{0}\to\tau^{\pm}\mu^{\mp}$ decays is performed using data corresponding to an integrated luminosity of 3 fb$^{-1}$ of proton-proton collisions, recorded with the LHCb detector in 2011 and 2012. For this search, the $\tau$ lepton is reconstructed in the $\tau^{-}\to\pi^{-}\pi^{+}\pi^{-}\nu_{\tau}$ channel. No significant signal is observed. Assuming no contribution from $B^{0}\to\tau^{\pm}\mu^{\mp}$ decays, an upper limit is set on the $B^{0}_{s}\to\tau^{\pm}\mu^{\mp}$ branching fraction of $\mathcal{B}\left( B^{0}_{s}\to\tau^{\pm}\mu^{\mp}\right)<4.2\times 10^{-5}$ at $95\%$ confidence level. If instead no contribution from $B^{0}_{s}\to\tau^{\pm}\mu^{\mp}$ decays is assumed, a limit of $\mathcal{B}\left( B^{0}\to\tau^{\pm}\mu^{\mp}\right)<1.4\times 10^{-5}$ is obtained at $95\%$ confidence level. These are the first limit on $\mathcal{B}\left( B^{0}_{s}\to\tau^{\pm}\mu^{\mp}\right)$ and the world's best limit on $\mathcal{B}\left( B^{0}\to\tau^{\pm}\mu^{\mp}\right)$.

Lepton-flavor-violating decays of mesons containing b quarks, such as B 0 (bd) → τ ± µ ∓ and B 0 s (bs) → τ ± µ ∓ , are extremely suppressed in the Standard Model (SM), with expected branching fractions of order 10 −54 [1].(The inclusion of charge-conjugate processes is implied throughout this Letter.)These processes involve not only quantum loops, but also neutrino oscillations.Signals at the level expected in the SM lie far below current and foreseen experimental sensitivities.However, many theoretical models proposed to explain possible experimental tensions observed in other B-meson decays (discussed below) naturally allow for branching fractions that are within current sensitivity.Among them, models containing a heavy neutral gauge boson (Z ) could lead to a B 0 s → τ ± µ ∓ branching fraction of up to 10 −8 [2,3] when only left-handed or right-handed couplings to quarks are considered, or of the order of 10 −6 [3] if both are allowed.In models with either scalar or vector leptoquarks, the largest predictions for the B 0 s → τ ± µ ∓ branching fraction range from 10 −9 to 10 −5 , depending on the assumed leptoquark mass [4][5][6].The three-site Pati-Salam gauge model favours values for this branching fraction in the range 10 −4 -10 −6 [7,8].
The SM predicts that the electroweak couplings for the three lepton families are universal, a result referred to as Lepton Flavor Universality (LFU).Experimental tests of LFU performed using b → s + − and b → c − ν processes show tensions with respect to the SM predictions for the observables R K ( * ) [9,10] and R(D ( * ) ) [11].For the latter, the observed discrepancy with respect to the SM prediction is greater than 3 standard deviations.Because theoretical models that can account for the possible LFU effects observed in data often predict Lepton Flavor Violation (LFV) as well [12], searches for LFV processes provide a powerful signature for probing these models.
An upper limit B (B 0 → τ ± µ ∓ ) < 2.2 × 10 −5 at 90% confidence level (CL) was obtained by the BaBar collaboration [13].There are currently no experimental results for the B 0 s → τ ± µ ∓ mode.This Letter reports results from the first search for the decay B 0 s → τ ± µ ∓ , along with the most stringent limit on the process B 0 → τ ± µ ∓ .The analysis is performed on data corresponding to an integrated luminosity of 3 fb −1 of proton-proton (pp) collisions, recorded with the LHCb detector during the years 2011 and 2012 at centre-of-mass energies of 7 and 8 TeV, respectively.The τ leptons are reconstructed through the decay τ − → π − π + π − ν τ , which mainly proceeds via the production of two intermediate resonances, a 1 (1260) − → π + π − π − and ρ(770) 0 → π + π − [14], which help in the signal selection.In this mode, the τ decay vertex can be precisely reconstructed, facilitating a good reconstruction of the B-meson invariant mass despite the undetected neutrino.To avoid experimenter bias, the B-meson invariant-mass signal region was not examined until the selection and fit procedures were finalised.The signal yield is determined by performing an unbinned maximum-likelihood fit to the reconstructed B-meson invariant-mass distribution and is converted into a branching fraction using the decay The LHCb detector [15,16] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks.The detector includes a high-precision tracking system consisting of a siliconstrip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet.The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty varying from 0.5% at low momentum to 1.0% at 200 GeV/c.The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c.Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors.Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter.Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The on-line event selection is performed by a trigger [17] consisting of a hardware stage based on information from the calorimeter and muon systems, followed by a software stage, which performs a full event reconstruction.At the hardware trigger stage, signal candidates are required to have a muon with high p T , while, for the normalisation sample, events are required to have a hadron with high transverse energy in the calorimeters.The software trigger requires a two-, three-, or four-track secondary vertex with a significant displacement from any primary pp interaction vertex.A multivariate algorithm [18] is used to identify secondary vertices consistent with the decay of a b hadron.At least one charged particle must have a transverse momentum p T > 1.0 (1.6) GeV/c for muons (hadrons), and must be inconsistent with originating from a PV.
Simulation is used to optimise the selection, determine the signal model for the fit and obtain the selection efficiencies.In the simulation, pp collisions are generated using Pythia [19] with a specific LHCb configuration [20].The τ decay is simulated using the Tauola decay library tuned with BaBar data [21], while the decays of all other unstable particles are described by EvtGen [22].Final-state radiation is accounted for using Photos [23].The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [24], as described in Ref. [25].
Both signal and normalisation candidates are formed using tracks that are inconsistent with originating from any PV.Candidate τ − → π − π + π − ν τ and D − → K + π − π − decays are reconstructed from three tracks forming a good-quality vertex and with particle identification information corresponding to their assumed particle hypotheses.Candidate B 0 (s) → τ ± µ ∓ decays are formed by combining a reconstructed τ lepton and an oppositely charged track identified as a muon.A control sample of same-sign candidates, which are formed by a τ lepton and a muon with identical charges, is also selected to serve, during the selection process, as a proxy for the large component of the background in which the muon and the tau candidate charges are uncorrelated.For the normalisation mode, B 0 → D − π + candidates are made out of a reconstructed D meson and an oppositely charged track identified as a pion.The decay vertex of the signal or normalisation B candidate is determined through a fit to all reconstructed particles in the decay chain [26], which is required to be of good quality.The B-meson p T is required to be greater than 5 GeV/c for both signal and normalisation modes.
While the neutrino from the τ decay escapes detection, its momentum vector can be constrained from the measured positions of the primary and τ decay vertices, the momenta of the muon and the three pions, and the trajectory of the muon.Then, by imposing the requirements that the mass of the system formed from the three pions and the unobserved neutrino corresponds to the mass of the tau lepton, and by requiring that the B decay vertex lies on the trajectories of the muon, of the tau lepton, and of the B meson, the invariant mass of the B 0 (s) candidate can be determined analytically up to a  twofold ambiguity.Because of the quadratic nature of the equation, the computed masses may be unphysical.This occurs in 32% of the selected signal in the simulated event sample due to measurement resolutions and in 48% of the same-sign candidates in data.These candidates are removed, thereby improving the signal-to-background ratio.The solution whose distribution shows the largest separation between signal and background is used as the reconstructed B invariant mass, M B , in the analysis.The distributions of M B for candidates satisfying the previously described initial selection in the simulated signal samples and in the opposite-sign control sample in the data are shown in Fig. 1.
To reduce the data to a manageable level and focus on the rejection of the most difficult backgrounds, the low-mass region with M B < 4 GeV/c 2 is discarded.The signal loss due to this requirement is negligible.
To further reduce the background, additional requirements, optimised with samesign candidates and simulated samples, are applied to the selected B 0 (s) → τ ± µ ∓ decays.Taking advantage of the resonant structure of the τ − → π − π + π − ν τ decay, candidates with both combinations of oppositely charged pions with invariant-masses below 550 MeV/c 2 are removed.Candidates with a three-pion invariant mass greater than 1.8 GeV/c 2 are discarded to veto the background contribution due to D + → π + π − π + decays.
A set of isolation variables is used to reduce background from decays with additional reconstructed particles.The first class of isolation variables exploits the presence of activity in the calorimeter to identify the contribution of neutral particles contained in a cone centred on the B or τ flight directions.The second class is based on the presence of additional tracks consistent with originating from the B or τ decay vertices, or uses a multivariate classifier, trained on simulated data, to discriminate against candidates whose decay products are compatible with forming good-quality vertices with other tracks in the event.These variables are combined using a Boosted Decision Tree (BDT) [27], trained on same-sign candidates and simulated B 0 s → τ ± µ ∓ decays.Candidates with a BDT output compatible with that of background are discarded.A second BDT is used to reduce to a negligible level the contribution of combinatorial background, which extends over the whole mass range but dominates at higher masses.It uses variables related to vertex quality and reconstructed particle opening angles and is trained on samples of same-sign candidates with M B > 6.2 GeV/c 2 and simulated B 0 s → τ ± µ ∓ decays.
Some background processes, such as B 0 (s) → D − (s) (→ µ − ν µ )π + π − π + , have M B distributions peaking in the signal region.In these decays, the three pions come from the B decay vertex, and therefore the reconstructed B and τ decay vertices are very close.Discarding candidates with a reconstructed τ decay-time significance lower than 1.8 reduces this type of background to a negligible level while keeping ∼75% of signal, according to studies performed on simulation.All previously described selection criteria also suppress a possible contribution from the B 0 → a 1 (1260) − µ + ν µ mode, whose selection efficiency is 60 times lower than that of the signal.Its rate is currently unmeasured, but, given that the largest known b → u semileptonic decay branching fractions are of the order of 10 −4 , its branching fraction is not expected to be much higher.Events from the decay τ − → π − π + π − π 0 ν τ passing the selection are also included as signal.
The selection procedure retains 17 746 candidates.According to studies based on simulations, the remaining background is dominated by B 0 (s) → D ( * ) (s) µνX decays.The selection efficiencies for the signal and normalisation modes, B 0 (s) →τ µ and B→Dπ , respectively, are estimated using simulation or, whenever possible, data.The efficiency B 0 (s) →τ µ includes those for both τ − → π − π + π − ν τ and τ − → π − π + π − π 0 ν τ decays, where the latter is weighted by the ratio of the two branching fractions.The τ − → π − π + π − π 0 ν τ channel contributes by ∼ 16% to the extracted signal yield.The tracking and particle identification efficiencies are determined using data [28,29].The trigger efficiency for the normalisation channel is estimated using a trigger-unbiased subsample made of events which have been triggered independently of the normalisation candidate.For the signal, muons from B + → J/ψ (→ µ + µ − )K + decays are used to evaluate the muon trigger efficiency and corrections are applied to the simulated signal samples.To account for differences between the control and the signal samples, the efficiency is computed as a function of the muon p T and IP.Simulation as well as B + → J/ψ (→ µ + µ − )K + decays is used to determine the software-trigger efficiency and its systematic uncertainty.
The signal yield for the normalisation mode is obtained from a fit to the invariantmass distribution of the B 0 → D − π + candidates.In the fit the signal is modelled by the sum of two Crystal Ball (CB) [30] functions, with tails on opposite sides, having common means and widths, but independent tail parameters.The tail parameters are fixed to values determined from a fit to a sample of B 0 → D − (→ K + π − π − )π + simulated decays, while all other parameters are left free.The small background contribution is described by an exponential function.The measured yield of the B 0 → D − π + mode is N norm = 22 588 ± 176 where the uncertainty is statistical only.
The B 0 (s) → τ ± µ ∓ branching fractions can be written as where N sig (s) is the number of observed B 0 (s) → τ ± µ ∓ decays and α norm (s) a normalisation factor.The latter is defined by using externally measured quantities: the ratio of b-quark hadronisation fractions to B 0 s and B 0 mesons, f B 0 s /f B 0 = 0.259 ± 0.015 [31], where the three quoted uncertainties are the statistical uncertainty due to the sizes of the signal and normalisation simulated and data samples, the systematic uncertainty on the selection efficiencies (dominated by the trigger efficiency contribution, ∼11%) and the total uncertainty on the externally measured quantities.A final BDT is built to split the selected candidates into four samples with different signal-to-background ratios.It combines 16 discriminating variables, none of which are correlated with the B-meson invariant mass.The most important ones are the invariant masses of the three-pion system and of the two combinations of oppositely charged pions, the B-meson IP and flight distance significances, and the output of the BDT based on isolation variables.The output of the BDT is transformed to have a uniform distribution between 0 and 1 for B 0 s → τ ± (→ π ± π ∓ π ± ν τ )µ ∓ simulated decays.As a consequence, its distribution for the background peaks at low BDT values.All samples are divided into four bins of equal width in BDT output.Their distributions are shown in Fig. 2.
The signal yield is evaluated by performing a simultaneous unbinned maximumlikelihood fit to the M B distributions in the range [4.6, 5.8] GeV/c 2 of the four samples corresponding to different BDT bins.In each bin, the data are described by the sum of a signal and a background component.The background shape is modelled by the upper tail of a reversed CB function, whose peak position and tail parameters are shared among BDT bins.For the determination of the systematic uncertainties, different sets of constrained parameters or alternative background models, such as the sum of two Gaussian functions, are considered.The signal shapes are described by double-sided Hypatia functions [33] whose parameters are initialized to the values obtained from a fit to the B 0 s → τ ± µ ∓ and B 0 → τ ± µ ∓ simulated samples and allowed to vary within Gaussian constraints accounting for possible discrepancies between data and simulation.The width of the Hypatia functions are ∼ 330 MeV/c 2 for both signal modes in the most sensitive BDT bin.As the separation between B 0 s → τ ± µ ∓ and B 0 → τ ± µ ∓ signal shapes is limited, two independent fits are performed while assuming the contribution of either the B 0 s or the B 0 signal only.The signal fractional yields in each BDT bin are Gaussian constrained according to their expected values and uncertainties.The fit result corresponding to the hypothesis of the B 0 s signal only is shown in Fig. 3.The fit procedure is validated by performing fits to a set of pseudoexperiments where the mass distributions are randomly generated according to the background model observed in the data.The pulls of all fitted parameters are normally distributed except those of the signal yields N sig , which have the expected widths but exhibit a very small bias of −3 ± 1 (2 ± 2) events for the B 0 s (B 0 ) mode.This effect is accounted for by adding the bias to N sig in the simultaneous fits to the four BDT regions for both B 0 and B 0 s .The obtained signal yields are where the uncertainties account for the statistical ones as well as those on the signal and background shape parameters.They show no evidence of any signal excess.
Using the calculated values of the normalisation factors α norm and α norm s from Eq. 1 together with Eq. 3, the observed yields from the likelihood fits are translated into upper limits on the branching fractions using the CLs method [34,35].The total uncertainty on the normalisation factor is accounted for as an additional Gaussian constraint in the simultaneous fit.Furthermore, a systematic uncertainty on the signal yield of 34 (41) for the B 0 s (B 0 ) mode, derived using different sets of constrained parameters or alternative Expected 1.6 × 10 −5 1.9 × 10 −5 background models, is added to account for the uncertainties in the background shape.The expected and observed CLs values as a function of the branching fraction are shown in the Supplemental Material [36].The corresponding limits on the B 0 s and B 0 branching fractions at 90% and 95% CL are given in Table 1 assuming negligible contribution from the B 0 → a 1 (1260) − µ + ν µ mode.A possible residual contribution of this background would lower the expected limits by ∼16% × (B(B 0 → a 1 (1260) − µ + ν µ )/10 −4 ).The impact of systematic uncertainties on the final limits is about 35%, dominated by the uncertainty on the background model.
These results represent the best upper limits to date.They constitute a factor ∼2 improvement with respect to the BaBar result for the B 0 mode [13] and the first measurement for the B 0 s mode.The allowed range on the B 0 s → τ ± µ ∓ branching fraction preferred by the three-site Pati-Salam model [7,8] is significantly reduced by the results presented in this Letter.

Figure 1 :
Figure 1: Normalized distributions of the reconstructed invariant mass for B 0s and B 0 in simulated event samples and for same-sign candidates in data, after applying the initial event selection (see the text).

Figure 2 :
Figure 2: Final BDT output binned distributions for data and simulated signal samples.The markers are displaced horizontally to improve visibility.

Figure 3 :
Figure 3: Distributions of the reconstructed B invariant-mass in data in the four final BDT bins with the projections of the fit for the B 0 s signal-only hypothesis overlaid.The lower-part of each figure shows the normalised residuals.

Figures 4 and 5 Figure 4 :Figure 5 :
Figures 4 and 5 show the expected and observed CLs values as a function of the B 0s → τ ± µ ∓ and B 0 → τ ± µ ∓ branching fractions.