Search for the lepton flavour violating decays $B^{+} \to K^{+} \tau^\pm \ell^\mp$ ($\ell = e, \mu$) at Belle

We present a search for the lepton-flavour-violating decays $B^+ \to K^+ \tau^\pm \ell^\mp$, with $\ell = (e, \mu)$, using the full data sample of $772 \times 10^6$ $B\overline{B}$ pairs recorded by the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We use events in which one $B$ meson is fully reconstructed in a hadronic decay mode. We find no evidence for $B^\pm \to K^\pm \tau \ell$ decays and set upper limits on their branching fractions at the 90% confidence level in the $(1$-$3) \times 10^{-5}$ range. The obtained limits are the world's best results.

Recently, there has been a resurgence of interest in the study of leptoquark fields in light of discrepancies in semi-leptonic B-decays [1], collectively known as B-physics anomalies, which challenge the assumed Lepton Flavour Universality (LFU) of fundamental interactions.These measurements have been obtained studying two different quark transitions: b → cτ ν and b → sℓℓ, where ℓ = (e, µ).If confirmed by further measurements, this would be clear evidence of New Physics in which new heavy particles couple preferentially to second and third generation leptons.Many extensions of the Standard Model (SM) that include violation of LFU predict Lepton Flavour Violating (LFV) processes in hadron decays with charged leptons in the final state [2].In particular, the vector leptoquark U 1 with SM quantum numbers (3, 1) 2/3 has been identified as the only single-mediator solution [3].In the minimal scenario, U 1 provides an interesting prediction, i.e. lower bounds on the lepton flavour violating b → sτ ∓ µ ± decay modes, for example B(B → Kτ µ) > 0.7 × 10 −7 .The branching fractions for the two ℓτ charge combinations are not in general the same, as they depend on the details of the physics mechanism producing the decay.
In this Letter, we report a search for B + → K + τ ∓ ℓ ± decays using the full Belle data sample recorded at the Υ(4S) resonance.The inclusion of the charge-conjugate decay mode is implied.This is the first such search from Belle.
The analysis is based on the full data sample of 772 × 10 6 BB pairs collected with the Belle detector [6] at the KEKB asymmetric-energy e + e − collider [7].The Belle detector is a large-solid-angle spectrometer, which includes a silicon vertex detector (SVD), a 50-layer central drift chamber (CDC), an array of aerogel threshold Cherenkov counters (ACC), timeof-flight scintillation counters (TOF), and an electromagnetic calorimeter (ECL) comprised of CsI(Tl) crystals located inside a superconducting solenoid coil that provides a 1.5 T magnetic field.An iron flux return located outside the coil is instrumented to detect K 0 L mesons and identify muons.
The analysis procedure is developed using Monte Carlo (MC) simulation based on events generated with EvtGen [? ], which includes final-state radiation (FSR) effects simulated by PHOTOS [9], and the detector response is simulated by GEANT3 [10].The B + → K + τ ± ℓ ∓ decays are generated using a uniform three-body phase space model (PHSP); we also consider variations in the linear combinations of the relevant operators for the b → sτ ℓ transitions: O i and the relative Wilson coefficients C τ ℓ i , where i = 9, 10, S, P [11].In each event, we require a fully reconstructed hadronic B ± decay, which we refer to as the tagged B meson candidate or B tag .This is done using the Full Event Interpretation (FEI) algorithm [12], a machine-learning algorithm developed for B-tagged analyses at Belle and Belle II.It supports both hadronic and semileptonic tagging, reconstructing B mesons across more than 4000 individual decay chains.The training is performed in a hierarchical manner: final-state particles are first reconstructed from detector information, then unstable particles (e.g.D, D * ) are built up from these particles, and then reconstruction of B-mesons is performed last.For each B tag candidate reconstructed by the FEI, a value of the final multi-variate classifier output, Σ FEI , is assigned.Σ FEI is distributed between zero and one, representing candidates identified as being background-like and signal-like, respectively.
For the hadronic FEI, the minimal number of tracks per event satisfying certain quality criteria is set to three, as the vast majority of B-meson chains include at least three charged particles, and such a criterion is useful for suppressing background from non-BB events.Requirements are placed on the impact parameters to ensure close proximity to the interaction point (IP), less than 0.5 cm in the transverse plane and less than 2.0 cm along the z axis (parallel to the e + beam).ECL clusters that are used for γ reconstruction are required to satisfy a region-dependent energy threshold criterion.All the intermediate states (π 0 , J/ψ, K 0 S and D ( * ) mesons) must pass loose cuts on the reconstructed invariant mass and only the best candidates in terms of Σ FEI are kept.The FEI results in many B tag candidates per event.The number of these candidates is reduced with selections on the beam-energy-constrained mass We then search for the signal B → Kτ ℓ decay in the rest of the event, which we refer to as the signal B meson candidate or B sig .The notation B → Kτ ℓ refers to one of the following four final states that we consider, where in addition to the kaon of opposite charge to B tag we associate the primary lepton, µ or e: B + → K + τ + µ − and B + → K + τ + e − defined as OS µ,e modes because the kaon and the primary lepton have opposite charge, and defined as SS µ,e modes.In all cases, we require that the τ decays to τ → eνν, τ → µνν, or τ → πν.The combined branching fraction for these decays is 46% [13].The τ → ρν mode, despite not being explicitly reconstructed, significantly contributes to the τ → πν candidates -by roughly one half -because of its large branching fraction (∼ 25%).
We reconstruct B + → Kτ ± ℓ ∓ decays by selecting three charged particles that originate from the vicinity of the IP and are not associated with the B tag .We require impact parameters less than 1.0 cm in the transverse plane and less than 4.0 cm along the z axis.To reduce backgrounds from low-momentum particles, we require that tracks have a minimum transverse momentum of 100 MeV/c.From the list of selected tracks, we identify K + candidates using a likelihood ratio R K/π = L K /(L K + L π ), where L K and L π are the likelihoods for charged kaons and pions, respectively, calculated based on the number of photoelectrons in the ACC, the specific ionization in the CDC, and the time of flight as determined from the TOF.We select kaons by requiring R K/π > 0.6, which has a kaon identification efficiency of 83% and a pion misidentification rate of 5%.Similarly, we select pions by requiring R π/K > 0.6, which has a pion identification efficiency of 84% and a kaon misidentification rate of 6%.Muon candidates are identified based on information from the KLM.We require that candidates have a momentum greater than 0.8 GeV/c (enabling them to sufficiently penetrate KLM), and a penetration depth and degree of transverse scattering consistent with those of a muon [14].The latter information is used to calculate a normalized muon likelihood ratio , where L µ is the likelihood for muons, for which we require R µ > 0.9.For this requirement, the average muon detection efficiency is 89%, with a pion misidentification rate of 1.5% [15].
Electron candidates are required to have a momentum greater than 0.5 GeV/c and are identified using the ratio of ECL cluster energy to the CDC track momentum, the shower shape in the ECL, the matching of the track with the ECL cluster, the specific ionization in the CDC, and the number of photoelectrons in the ACC.This information is used to calculate a normalized electron likelihood ratio R e = L e /(L e + L hadrons ), where L hadrons is a product of hadron likelihoods, for which we require R e > 0.9.This requirement has an efficiency of 92% and a pion misidentification rate below 1% [16].
After selecting one charged kaon, one prompt lepton (electron or muon) and the τ daughter (electron, muon or pion) with the appropriate charge combination, we require that there are no other tracks than the ones associated to B tag or B sig .The charged kaon and the prompt lepton are uniquely determined to minimize χ 2 of the B sig vertex fit for the prompt tracks.In case there are two possibilities in τ daughter particle identification, τ leptonic decay has priority.Unlike other B decays involving τ 's (e.g.B → τ ν, B → D * τ ν), the B → Kτ ℓ channel has the unique property of having the one (or two) neutrino(s) coming only from the τ itself, allowing the signal yield to be extracted using the recoil mass, M recoil , which should peak at the mass of the τ lepton.Such variable is easily obtained at B-factories, because of the known initial kinematics and the full reconstruction of the other B in the event.In fact, if we consider the B sig , the 4-momentum of the τ can be written as: where p B sig is not known a priori.In the frame where the Υ(4S) resonance is at rest, the two B mesons are back to back, hence: furthermore, the two B's have the same energy, which is half the energy √ s of the Υ(4S): In order to obtain the best resolution on the B variables, we replace E * Btag with E * beam , but use the reconstructed p * Btag rather than the average value Using the condition (2) and the substitution E * B = E * beam in equation ( 1), we obtain: where θ is the angle between p * Btag and p * Kℓ .The main source of background consists of Cabibbo-favoured transitions from B + B − events.For the OS configurations, where the primary lepton charge is opposite to the B sig charge, the dominant background comes from semileptonic D decays: On the other hand, for the SS configurations the primary lepton and the B sig have the same charge and the semileptonic B + decays like B + → D 0 (→ K + X − )Xℓ + ν ℓ provide the three charged particles for the B sig candidates.Events compatible with a B + sig → D 0 (→ K + π − )X + decay are rejected by vetoing candidates in the range 1.81 GeV/c 2 < m K + t − < 1.91 GeV/c 2 , where t denotes the primary lepton or the track from the τ in the OS or SS case, respectively.In the first case, only the Kτ µ modes show such a D 0 component because of the larger probability to identify a pion as a muon rather than an electron.In the SS case, the D 0 peak is much more prominent and relates to the τ → π mode and is independent of the flavour of the primary lepton.
We further improve the signal selection using a Boosted Decision Tree (BDT) classification.Two classifiers are trained for the background suppression.The first one is optimised to reduce the BB events and uses as inputs some kinematic information as well as the topology of the B sig and information on the rest of the event (the set of ECL clusters that are not used for the B sig and B tag reconstruction).In particular we use: the invariant mass m K + t − , which helps in suppressing the combinatorial background from charm decays, the number of ECL clusters that are not associated with the reconstructed event and the sum of their energies, the extended Fox-Wolfram moments [17], the distance from the IP of the signal vertex and the distance between the primary kaon and each of the other two signal tracks.
For each mode only the ten most important variables are kept for the final training, the metrics being the information gain provided by each feature in all the decision trees used for the classifier.The threshold t on the BDT response is optimized using a figure of merit [18], defined as where ǫ(t) is the efficiency for the cut t, N bkg represents the number of background events surviving the cut t in the signal region defined as 1.68 GeV/c 2 < M recoil < 1.87 GeV/c 2 which contains ∼ 80% of the signal events.The MC sample used to estimate the background corresponds to a luminosity of twice that of data.After the cut on the first BDT output, a large fraction of the surviving background is coming from qq (q = u, d, s, c) events; for this reason a second BDT classifier is trained on these events.The input variables to suppress the continuum background are: event-shape variables such as R 2 and the CLEO cones [19] and the angle θ T between the thrust axes calculated from final-state particles for the B tag and for the rest of the event in the c.m. frame.
We use control samples in order to evaluate systematic uncertainties related to data/MC discrepancies and to calibrate the signal shape PDF as it is fixed from MC simulation.The first control sample consists of B + → D − π + π + events, generated in MC as the result of twobody B + → D * * 0 (→ D − π + )π + decays, with D * * 0 = {D * 0 0 , D * 0 2 } according to Refs.[20,21].This channel has similar topology to our signal as the D can be treated as the τ , allowing for a comparison of the performance of the first BDT classifier between data and MC (Fig. 1(top) for OS µ ).For the calibration of the efficiency of qq suppression a second control sample B → J/ψK is used because of the similar final state while no usage of the B sig topology is required (Fig. 1(bottom) for OS µ ).The signal yields for B → Kτ ℓ decays are obtained by performing unbinned extended maximum-likelihood fits to the M recoil distributions.The PDF used to model reconstructed signal decays consists of the sum of a reversed Crystal Ball function to model the main peak and the high-side power-law tail, and a broad Gaussian, with the same mean parameter, to describe the candidates with worse resolution due to imperfect B tag reconstruction.The background events have a smooth shape in the M recoil signal region, and are described by a 2nd-order Chebychev polynomial.The yields are floated, as well as the background shape parameters while the parameters describing the signal PDF are fixed from the MC simulation.We apply corrections to these parameters to account for small differences between MC simulation and data.These correction factors are obtained from the B + → D ( * )0 π + control samples where M recoil is calculated from the pion from the B + decay and the B tag .
We validate our fitting procedure and check for fit bias using MC simulation.We generate large ensembles of simulated experiments in which the M recoil distributions are generated from the PDFs used for fitting.
The M recoil distributions for LFV B → Kτ ℓ decays along with projections of the fit result are shown in Fig. 2. The fitted signal yields listed in Table I are consistent with zero for all four modes.We calculate the upper limit (UL) for these modes at the 90% C.L. using a frequentist method.In this method, for different numbers of signal events N sig (gen), we generate 10000 pseudo experiments with signal and background PDFs as obtained in the nominal data fit, with each set of events being statistically equivalent to our data sample of 711 fb −1 .We fit all these simulated data sets, and, for each value of N sig (gen), we calculate the fraction of MC experiments that have N sig ≤ N sig (data).The 90% C.L. upper limit is taken to be the value of N sig (gen) (called here N UL sig ) for which 10% of the experiments have N sig ≤ N sig (data).The upper limit on the branching fraction is then derived using the formula: where N BB is the number of BB pairs = (772 ± 11) × 10 6 , f +− is the branching fraction B(Υ(4S) → B + B − ) for charged B decays (using 0.514 ± 0.006 [13]), and ε is the signal reconstruction efficiency.By default ε is obtained with signal PHSP MC samples [22], while we also consider a NP model with a combination of the effective operators O S,P by reweighting the q 2 = m 2 τ ℓ distribution which gives the smallest efficiency.The systematic uncertainty in B UL is included by smearing the N sig distribution obtained from the MC fits with the fractional systematic uncertainty.The results are listed in Table I.
The systematic uncertainties in our measurements are listed in Table II, where additive   uncertainties arise from the signal yield, while multiplicative uncertainties are from the efficiency.Uncertainties in the shape of the PDFs used for the signal are evaluated by varying all fixed parameters by ±1σ, including the correction factors to the shapes obtained from the B + → D ( * )0 π + control samples, and varying the fraction of the Gaussian (f sig ) by 10%.The resulting change in the signal yield is taken as the systematic uncertainty.The reconstruction efficiency for B tag evaluated via MC simulation, is corrected to account for differences between MC and data in the branching fractions and models used for hadronic B decays.This correction is evaluated by comparing the number of events containing both a B tag and a semileptonic B → Dℓν [23].The resulting correction factor is 85 ± 5% and the uncertainty in this value is taken as a systematic uncertainty.
The systematic uncertainty due to the charged track reconstruction is evaluated using resulting in an uncertainty of 0.35% per track.Uncertainties due to K + and π + (for τ → πν mode) identification is 1.3%, as measured with a D * + → D 0 (K − π + )π + sample.The uncertainty due to lepton identification is evaluated using J/ψ → ℓ + ℓ − events, resulting in an uncertainty of 0.3% for muons and 0.4% for electrons.The systematic uncertainty arising from the number of BB pairs is 1.4%.We compare the efficiency of the BDT selection between data and MC samples with the control channel B + → D − π + π + for BB suppression and B + → J/ψK + for continuum suppression, the differences between data and MC simulation are assigned as a systematic uncertainty.
We use a systematic uncertainty of 1.2% in the fraction f +− [13].
We have searched for the lepton-flavour-violating decays B + → K + τ ± ℓ ∓ using the full Belle data set.We find no evidence for these decays and set the following upper limits on the branching fractions at the 90% C.

FIG. 2 :
FIG.2: Observed M recoil distributions for the four B → Kτ ℓ modes, along with projections of the fit result.The black dots show the data, the dashed blue curve shows the background component, and the solid red curve shows the overall fit result.The dash-dotted green curve shows the signal PDF, with a normalization corresponding to the 90% C.L. upper limit.

( 6 )
Our results are the most stringent limits to date.This work, based on data collected using the Belle detector, which was operated until June 2010, was supported by the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan, the Japan Society for the Promotion of Science (JSPS), and the Tau-Lepton Physics Research Center of Nagoya University; the Australian Research Council including grants DP180102629, DP170102389, DP170102204, DE220100462, DP150103061, FT130100303; Austrian Federal Ministry of Education, Science and Research (FWF) and FWF Austrian Science Fund No. P 31361-N36; the National Natural Science Foundation of China under Contracts No. 11675166, No. 11705209; No. 11975076; No. 12135005; No. 12175041; No. 12161141008; Key Research Program of Frontier Sciences, Chinese Academy of Sciences (CAS), Grant No. QYZDJ-SSW-SLH011; Project ZR2022JQ02 supported by Shandong Provincial Natural Science Foundation; the Ministry of Education, Youth and Sports of the Czech Republic under Contract No. LTT17020; the Czech Science Foundation Grant No. 22-18469S; Horizon 2020 ERC Advanced Grant No. 884719 and ERC Starting Grant No. 947006 "InterLeptons" (European Union); the Carl Zeiss Foundation, the Deutsche Forschungsgemeinschaft, the Excellence Cluster Universe, and the Volkswa-

TABLE I :
Efficiencies, fit yields, and branching fraction upper limits at the 90% C.L. for PHSP