Search for the lepton-flavour violating decays $B^+ \to K^+ {\mu}^{\pm} e^{\mp}$

A search for the lepton-flavour violating decays $B^+ \to K^+ {\mu}^{\pm} e^{\mp}$ is performed using a sample of proton-proton collision data, collected with the LHCb experiment at centre-of-mass energies of $7$ and $8~{\rm TeV}$ and corresponding to an integrated luminosity of 3$\rm~fb^{-1}$. No significant signal is observed, and upper limits on the branching fractions are set as $\mathcal{B}(B^+ \to K^+ {\mu}^- e^+)<7.0~(9.5) \times 10^{-9}$ and $\mathcal{B}(B^+ \to K^+ {\mu}^+ e^-)<6.4~(8.8) \times 10^{-9}$ at 90 (95) % confidence level. The results improve the current best limits on these decays by more than one order of magnitude.

The observation of neutrino oscillations has provided the first evidence for leptonflavour violation (LFV) in neutral leptons.By contrast, LFV in the charged sector is negligible in the Standard Model (SM) [1] and any observation of a charged LFV decay would be indisputable evidence for physics beyond the SM (BSM).In light of recent flavour anomalies in semileptonic b → s + − transitions [2][3][4], many SM extensions have been proposed that link lepton-universality violation to LFV, predicting in particular a significantly enhanced decay rate in b → sµ ∓ e ± processes.In this Letter a search for the decays of B + → K + µ ± e ∓ is reported (Inclusion of charge-conjugate processes is implied throughout the letter).Their branching fractions are predicted to be in the range 10 −8 − 10 −10 in leptoquark models [5,6], extended gauge boson models [7], or models including CP violation in the neutrino sector [8].Currently, the best limits of B(B + → K + µ − e + ) < 9.1 × 10 −8 and B(B + → K + µ + e − ) < 13 × 10 −8 have been set by the BaBar collaboration at the 90% confidence level [9].
A data set of proton-proton (pp) collisions corresponding to an integrated luminosity of 3 fb −1 , recorded with the LHCb detector in 2011 and 2012 at centre-of-mass energies of 7 TeV and 8 TeV, respectively, is used in this analysis.The two final states with different lepton charge combinations are studied independently, since they could be affected differently by BSM dynamics.The yields of the B + → K + µ ± e ∓ decays are normalised to those of the B + → K + J/ψ (→ µ + µ − ) decay, which has a well-known branching fraction [10], the same topology, and similar signatures in the detector.The B + → K + J/ψ (→ e + e − ) decay is also used as a control channel in the analysis.
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs.[11,12].The detector includes a silicon-strip vertex detector surrounding the pp interaction region, tracking stations located either side of a dipole magnet, ring-imaging Cherenkov (RICH) detectors, calorimeters and muon chambers.
The online event selection is performed by a trigger [13], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.At the hardware trigger stage, B + → K + µ ± e ∓ and B + → K + J/ψ (→ µ + µ − ) event candidates are required to have a muon with high transverse momentum, p T .In the subsequent software trigger, at least one charged particle must have a p T > 1.7 GeV/c in the 2011 data set and p T > 1.6 GeV/c in 2012, unless the particle is identified as a muon in which case p T > 1.0 GeV/c is required.This track must be significantly displaced from any primary interaction vertex (PV) in the event.Finally, a two-or three-track secondary vertex with a significant displacement from any PV is required, where a multivariate algorithm [14] is used for the identification of secondary vertices consistent with the weak decay of a b hadron.
Simulated samples are used to evaluate signal efficiencies, to train multivariate classifiers, to model the shape of the invariant mass of the B + → K + µ ± e ∓ signal candidates, and to study physics backgrounds.In the simulation, pp collisions are generated using Pythia [15,16] with a specific LHCb configuration [17].Decays of unstable particles are described by EvtGen [18], in which final-state radiation is generated using Photos [19].A phase-space model is adopted for signal B + → K + µ ± e ∓ decays.The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [20] as described in Ref. [21].
The B + → K + µ ± e ∓ candidates passing the trigger selection are reconstructed by combining three charged tracks originating from a good-quality common vertex.The tracks forming the B + candidate are required not to originate from any PV and must have sizeable transverse momentum.Due to the long lifetime of the B + meson, this vertex is required to be well separated from any PV.The B + direction vector, determined from its production and decay vertex positions, must be aligned with its momentum vector.The mass of the reconstructed B + candidate, m(K + µ ± e ∓ ), is restricted to lie within ±1500 MeV/c 2 of the known B + meson mass [10].Furthermore, the B-meson decay products must be well identified as a kaon, an electron and a muon, exploiting information from the Cherenkov detectors, the calorimeters, and the muon stations.The electron candidate kinematics are corrected for bremsstrahlung photon emission if a compatible photon candidate in the calorimeter is found.Kaon and electron candidates that have hits in the muon stations consistent with their trajectories are rejected.The same selection is applied to the normalisation (control) channels, for which the dimuon (dielectron) invariant mass is additionally required to be consistent with the known J/ψ mass [10].The selection and analysis procedures were developed without inspecting the signal data in the region m(K The most significant backgrounds originate from partially reconstructed B + decays, e.g. from double semileptonic B + → D 0 X + ν with D 0 → K + Y − ν decays, where X and Y represent hadrons, while and are leptons.They are removed by imposing the requirement m(K + − ) > 1885 MeV/c 2 .Contributions from decays involving charmonium resonances, where one lepton is misidentified as a kaon or as a lepton of a different flavour, are rejected by mass vetoes.
The combinatorial background, which consists of random tracks that are associated to a common vertex, is reduced using a boosted decision tree (BDT) [22,23] algorithm.This BDT combines information about the B + meson kinematics and information related to its flight distance, decay vertex quality and impact parameter with respect to the primary vertex.In addition, it uses information such as the impact parameters of the electron, muon and kaon candidates, and the isolation of the B + candidate from any other charged track in the event [24].
The BDT is trained on simulated B + → K + µ ± e ∓ events that have satisfied the previous requirements.The simulated samples are adjusted using B + → K + J/ψ (→ µ + µ − ) and B + → K + J/ψ (→ e + e − ) decays in data to correct data-simulation differences in the Bmeson production kinematics, vertex quality, and detector occupancy represented by the number of tracks in the detector.The upper-mass sideband, corresponding to m(K + µ ± e ∓ ) ∈ [5385, 6000] MeV/c 2 , is used as a proxy for the background.The training is performed using a k-folding approach [25] with ten folds, which allows the whole sample to be used without biasing the output of the classifier.The optimal requirement on the BDT classifier is chosen to give the best expected upper limits on the branching fractions The candidates surviving this multivariate selection are used to train a second BDT, dedicated to reject background from partially reconstructed b-hadron decays.The background sample for the training is taken from the lower-mass sideband in data, corresponding to m(K + µ ± e ∓ ) ∈ [4550, 4985] MeV/c 2 , where the partially reconstructed background is expected to contribute.The signal proxy is the same as for the first BDT.The training procedure shares the k-folding approach and the same set of discriminating variables used to construct the first multivariate discriminant, with the addition of the ratio between the projections of the electron and the K + µ ± momenta orthogonal to the B-meson direction of flight.The requirements on the second BDT are optimized in the same manner as the first BDT.The final stage of the selection, where requirements on the particle identification (PID) variables based on a neural net classifier for the kaon, electron and muon are applied [26], ensures the suppression of candidates from decays with misidentification of at least one particle.
The performance of the PID algorithms is not perfectly simulated, and thus a correction is performed using high-purity calibration data samples of muons from B → XJ/ψ (→ µ + µ − ) decays, electrons from B + → K + J/ψ (→ e + e − ) decays, and kaons from D * + → D 0 (→ K − π + )π + decays [27].The calibration data are binned in the particle's momentum and pseudorapidity, and in the detector occupancy.Particle identification variables for the simulated data sets are sampled from the distributions of calibration data in the corresponding bin.The performance of the PID resampling is validated on both the The potential contamination from b-hadron decays in the signal mass region after selection is analysed using dedicated simulated samples.Two categories are analysed: fully reconstructed B decays, with at least one particle in the final state misidentified, such as the semileptonic decays B + → K + + − and B + → K + J/ψ (→ + − ), or fully hadronic B + decays as B + → K + π + π − ; partially reconstructed decays in which at least one particle is not reconstructed and one or more particles are misidentified in addition, such as ) and B + → D 0 + ν transitions, where the D 0 meson decays further to K + π − or K + − ν .The expected number of candidates from each possible background source after the selection is evaluated from simulation and is found to be negligible.
The branching fraction B(B + → K + µ ± e ∓ ) is measured relative to the normalisation channel using where the ε(B + → K + J/ψ (→ µ + µ − )) and ε(B + → K + µ ± e ∓ ) denote the efficiencies of the normalisation and signal channels, respectively; N (B + → K + J/ψ (→ µ + µ − )) and N (B + → K + µ ± e ∓ ) are the observed B + → K + J/ψ (→ µ + µ − ) and B + → K + µ ± e ∓ yields, respectively.The value of the branching fraction of the normalisation mode is B(B + → K + J/ψ (→ µ + µ − )) = (6.02± 0.17) × 10 −5 , taken from Ref. [10].The yield of the normalisation channel is determined from an unbinned extended maximum-likelihood fit to the invariant mass m(K + µ + µ − ) of the selected B + → K + J/ψ (→ µ + µ − ) candidates, performed separately on 2011 and 2012 data.The sum of two Crystal Ball functions [28] is used to parameterise the signal, while an exponential function models the background.The yields resulting from the fits are 26940 ± 170 for 2011 and 59220 ± 250 for 2012 data.The efficiencies are calculated taking into account all selection requirements.The analysis is performed assuming a phase-space model for the signal decay.Efficiency maps in bins of the invariant masses of the particles in the final state m 2 K + e ± and m 2 K + µ ∓ are provided in Fig 1 to allow for the interpretation of the result in different BSM scenarios.
All efficiencies are determined from calibrated simulation samples and the normalisation factors for the two decay channels are given in Table 1.The two data taking periods are combined into a single normalisation factor taking into account the relative data sizes and efficiencies.The ratio α/B(B + → K + J/ψ (→ µ + µ − )), which excludes external inputs, is also quoted.K + e ± and m 2 K + µ ∓ .The variation of efficiency across the Dalitz plane is due to applied vetoes.The efficiencies are given in per mille.
Table 1: Normalisation factor α for B + → K + µ − e + and B + → K + µ + e − final states.The ratio α/B(B The invariant-mass distribution of B + → K + µ ± e ∓ candidates is modeled differently depending on whether bremsstrahlung photons have been included in the momentum calculation for the electrons.The sum of two Crystal Ball functions with common mean value is used in both cases.In the case of bremsstrahlung the tails are on opposite sides of the peak.Otherwise, the two tails share the same parameters.Their values, obtained from the B + → K + µ ± e ∓ simulation, are corrected taking into account the differences between data and simulation in B + → K + J/ψ (→ µ + µ − ) and B + → K + J/ψ (→ e + e − ) decays.Two types of unbinned maximum-likelihood fits are performed on the dataset.The first fit assumes only background is present, with the background modeled with an exponential function.From this fit 3.9 ± 1.1 and 0.9 ± 0.6 background candidates are expected in the signal mass window for the B + → K + µ + e − and B + → K + µ − e + modes, respectively.The second fit includes the signal component, which is used to determine the signal yields.The B + → K + µ − e + and B + → K + µ + e − invariant-mass distributions are fitted separately.
After unblinding the data set, there are 1 (2) candidates in the signal mass window m(K + µ ± e ∓ ) ∈ [5100, 5370] MeV/c 2 for the B + → K + µ − e + (B + → K + µ + e − ) channels, respectively, in agreement with the background-only hypothesis (cf.Fig. 2).The upper limits on the branching fractions are set with the CL s method [29], using the GammaCombo framework [30,31] with a one-sided test statistic.The likelihoods are computed from fits to the invariant-mass distributions with the normalisation constant constrained to its nominal value accounting for statistical and systematic uncertainties.Pseudoexperiments, in which the nuisance parameters are input at their best fit value and the background yield is varied according to its systematic uncertainty, are used for the evaluation of the test statistic.The resulting upper limits are shown in Fig. 3 and Table 2.
The dominant sources of systematic uncertainty on the upper limits are due to applied  simulation corrections.These include the kinematic difference of B-meson production, residual difference between correcting the muon and electron candidates, and PID resampling.Furthermore, the determination of trigger efficiencies and the knowledge of the background invariant-mass distribution are also considered in evaluating the systematic uncertainty.The systematic uncertainty on the sampling procedure of the PID variables includes two components.The first stems from applying the sPlot [32] method to the calibration data, and adds an uncertainty of 0.1% for kaons and muons, and 3% for electrons, the latter being a conservative estimate originating from a comparison of the sPlot method with a cut-and-count method.The second component addresses the choice of binning in the sampling procedure.It is evaluated by recalculating the normalisation factor α using a finer and a coarser binning, and taking the largest deviation with respect to the baseline result.
A small difference in the correction procedure is observed depending on the choice of control channel, namely B + → K + J/ψ (→ µ + µ − ) or B + → K + J/ψ (→ e + e − ).This difference, referred to as electron-muon difference, is taken as systematic uncertainty.
The systematic uncertainty from the fitting model is determined to be 2.1% using a bootstrapping approach.The systematic uncertainty on the background model is calculated by repeating the fit using an alternative model, where the exponential function is obtained from a sample enriched in background events.The difference between the alternative and nominal background parametrization is taken as a systematic uncertainty.The uncertainty on the knowledge of the B + → K + J/ψ (→ µ + µ − ) branching fraction is combined with the uncertainty due to the limited size of the simulation sample and is propagated to the normalisation constant, corresponding to a systematic uncertainty of 3.5%.A summary of systematic uncertainties is reported in Tab. 3.
In conclusion, a search for the lepton-flavour violating decays B + → K + µ ± e ∓ is performed using data from proton-proton collisions recorded with the LHCb experiment at centre-of-mass energies of 7 TeV and 8 TeV, corresponding to an integrated luminosity of 3 fb −1 .A uniform distribution of signal events within the phase space accessible to the final-state particles is assumed.No excess is observed over the background-only hypothesis, and the resulting upper limits on the branching fractions are B(B + → K + µ − e + ) < 7.0 (9.5) × 10 −9 and B(B + → K + µ + e − ) < 6.4 (8.8) × 10 −9 at 90 (95)% confidence level.The results improve the current best limits on the decays [9] by more than one order of magnitude.Moreover, the measurements impose strong constraints on the aforementioned extensions to the SM [5][6][7][8].

Figure 1 :
Figure 1: Efficiency of (left) B + → K + µ − e + and (right) B + → K + µ + e − as function of the squared invariant masses m 2K + e ± and m 2 K + µ ∓ .The variation of efficiency across the Dalitz plane is due to applied vetoes.The efficiencies are given in per mille.

Figure 2 :
Figure 2: Invariant-mass distributions of the (left) B + → K + µ − e + and (right) B + → K + µ + e − candidates obtained on the combined data sets recorded in 2011 and 2012 with background only fit functions (blue continuous line) and the signal model normalised to 10 candidates (red dashed line) superimposed.The signal window is indicated with grey dotted lines.Difference between the two distributions arises from the effect of the m(K + − ) requirement.

Figure 3 :
Figure 3: Upper limits on the branching fractions of (left) B + → K + µ − e + and (right) B + → K + µ + e − decays obtained on the combined data sets recorded in 2011 and 2012.The red solid line (black solid line with data points) corresponds to the distribution of the expected (observed) upper limits, and the light blue (dark blue) band contains the 1σ (2σ) uncertainties.