Search for the decays $B_s^0\to\tau^+\tau^-$ and $B^0\to\tau^+\tau^-$

A search for the rare decays $B_s^0\to\tau^+\tau^-$ and $B^0\to\tau^+\tau^-$ is performed using proton--proton collision data collected with the LHCb detector. The data sample corresponds to an integrated luminosity of 3fb$^{-1}$ collected in 2011 and 2012. The $\tau$ leptons are reconstructed through the decay $\tau^-\to\pi^-\pi^+\pi^-\nu_{\tau}$. Assuming no contribution from $B^0\to\tau^+\tau^-$ decays, an upper limit is set on the branching fraction $\mathcal{B}(B_s^0\to\tau^+\tau^-)<6.8\times 10^{-3}$ at 95% confidence level. If instead no contribution from $B_s^0\to\tau^+\tau^-$ decays is assumed, the limit is $\mathcal{B}(B^0\to\tau^+\tau^-)<2.1 \times 10^{-3}$ at 95% confidence level. These results correspond to the first direct limit on $\mathcal{B}(B_s^0\to\tau^+\tau^-)$ and the world's best limit on $\mathcal{B}(B^0\to\tau^+\tau^-)$.

Processes where a B meson decays into a pair of oppositely charged leptons are powerful probes in the search for physics beyond the Standard Model (SM).Recently, the first observation of the B 0 s → µ + µ − decay was made [1,2] (the inclusion of charge-conjugate processes is implied throughout this Letter).Its measured branching fraction (B) is compatible with the SM prediction [3] and imposes stringent constraints on theories beyond the SM.Complementing this result with searches for the tauonic modes B → τ + τ − , where B can be either a B 0 or a B 0 s meson, is of great interest in view of the recent hints of lepton flavour non-universality obtained by several experiments.In particular the measurements of R(D ( * ) ) = B(B 0 →D ( * ) τ + ντ ) B(B 0 →D ( * ) + ν ) , where + represents either a muon, an electron or both, are found to be larger than the SM prediction by 3.9 standard deviations (σ) [4], and the measurement of R K = B(B + →K + µ + µ − ) B(B + →K + e + e − ) is 2.6σ lower than the SM prediction [5].Possible explanations for these and other [6] deviations from their SM expectations include leptoquarks, W /Z bosons and two-Higgs-doublet models (see e.g.Refs.[7,8]).In these models, the B → τ + τ − branching fractions could be enhanced with respect to the SM predictions, B(B 0 → τ + τ − ) = (2.22 ± 0.19) × 10 −8 and B(B 0 s → τ + τ − ) = (7.73±0.49)×10−7 [3], by several orders of magnitude [8][9][10][11][12].All minimal-flavour-violating models predict the same enhancement of B(B 0 s → τ + τ − ) over B(B 0 → τ + τ − ) as in the SM.
The experimental search for B → τ + τ − decays is complicated by the presence of at least two undetected neutrinos, originating from the decay of the τ leptons.The BaBar collaboration has searched for the B 0 → τ + τ − mode [13] and published an upper limit B(B 0 → τ + τ − ) < 4.10 × 10 −3 at 90% confidence level (CL).There are currently no experimental results for the B 0 s → τ + τ − mode, though its branching fraction can be indirectly constrained to be less than 3% at 90% CL [14][15][16].
In this Letter, the first search for the rare decay B 0 s → τ + τ − is presented, along with a search for the B 0 → τ + τ − decay.The analysis is performed with proton-proton collision data corresponding to integrated luminosities of 1.0 fb −1 and 2.0 fb −1 recorded with the LHCb detector at centre-of-mass energies of 7 and 8 TeV, respectively.The τ leptons are reconstructed through the decay τ − → π − π + π − ν τ , which proceeds predominantly through the decay chain τ − → a 1 (1260) − ν τ , a 1 (1260) − → ρ(770) 0 π − [17].The branching fraction B(τ − → π − π + π − ν τ ) is (9.31 ± 0.05)% [18].Due to the final-state neutrinos the τ + τ − mass provides only a weak discrimination between signal and background, and cannot be used as a way to distinguish B 0 s from B 0 decays.The number of signal candidates is obtained from a fit to the output of a multivariate classifier that uses a range of kinematic and topological variables as input.Data-driven methods are used to determine signal and background models.The observed signal yield is converted into a branching fraction using as a normalisation channel the decay B 0 → D − D + s [19,20], with The LHCb detector, described in detail in Refs.[21,22], is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5.The online event selection is performed by a trigger [23], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.The hardware trigger stage requires events to have a muon with high transverse momentum (p T ) with respect to the beam line or a hadron, photon or electron with high transverse energy in the calorimeters.For hadrons, the transverse energy threshold is around 3.5 GeV, depending on the data-taking conditions.The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from the primary pp interaction vertices (PVs).A multivariate classifier [24] is used for the identification of secondary vertices that are significantly displaced from the PVs, and are consistent with the decay of a b hadron.At least one charged particle must have p T > 1.7 GeV/c and be inconsistent with originating from any PV.
Simulated data are used to optimise the selection, obtain the signal model for the fit and determine the selection efficiencies.In the simulation, pp collisions are generated using Pythia [25] with a specific LHCb configuration [26].Decays of hadrons are described by EvtGen [27], in which final-state radiation is generated using Photos [28].The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [29] as described in Ref. [30].The τ − → π − π + π − ν τ decays are generated using the resonance chiral Lagrangian model [31] with a tuning based on the BaBar results for the τ − → π − π + π − ν τ decays [32], implemented in the Tauola generator [33].
In the offline selection of the candidate signal and normalisation decays, requirements on the particle identification (PID) [34], track quality and the impact parameter with respect to any PV are imposed on all charged final-state particles.Three charged tracks, identified as pions for the B → τ + τ − decays, and pions or kaons for the B 0 → D − D + s decays, forming a good-quality vertex are combined to make intermediate τ , D + and D + s candidates.The kinematic properties of these candidates, like momenta and masses, are calculated from the three-track combinations.The flight directions of the τ , D + and D + s candidates are estimated from their calculated momentum vectors.For the τ candidates this is a biased estimate due to the missing neutrinos.In turn, B-meson candidates are reconstructed from two oppositely charged τ or from D − and D + s candidates with decay vertices well separated from the PVs.The B-meson candidates are required to have p T > 2 GeV/c, at least one τ , D + and D + s candidate with p T > 4 GeV/c and at least one pion or kaon with p T > 2 GeV/c.No further selection requirements are imposed on the normalisation mode.
For each τ candidate, the two-dimensional distribution of the invariant masses m π + π − of the two oppositely charged two-pion combinations is divided into nine sectors, as illustrated in Fig. 1.Exploiting the intermediate ρ(770) 0 resonance of the τ decays, these sectors are used to define three regions.The signal region consists of B candidates with both τ candidates in sector 5, and is used to determine the signal yield.The signal-depleted region, composed of B candidates having at least one τ candidate in sectors 1, 3, 7 or 9, provides a sample used when optimising the selection.The control region corresponds to B candidates with one τ candidate in sectors 4, 5 or 8 and the other in sectors 4 or 8, and provides the background model.
For the B → τ + τ − modes, further requirements are imposed on two types of isolation variables that are able to discriminate signal from background from partially reconstructed decays with additional charged or neutral particles.The first class of isolation variables, based on the decision of a multivariate classifier trained on simulated signal and other b-hadron decays, discriminates against processes containing additional charged tracks that either make a good-quality vertex with any selected pion or τ candidate, or belong to the same b-hadron decay as the selected pion candidates.The second class of isolation variables is based on calorimeter activity due to neutral particles in a cone, defined in terms of the pseudorapidity and polar angle, centred on the B candidate momentum.
In addition to the isolation variables, a method to perform an analytic reconstruction of the B → τ + τ − decay chain, described in detail in Refs.[35,36], has been developed.It combines geometrical information about the decay and mass constraints on the particles (B, τ and ν) in the decay chain to calculate the τ momenta analytically.The possible solutions for the two τ momenta are found as solutions of a system of two coupled equations of second degree with two unknowns.The finite detector resolution and approximations made in the calculation prevent real solutions being found for a substantial fraction of the signal events.However, several intermediate quantities associated with the method are exploited to discriminate signal from background.
To make full use of the discrimination power present in the distributions of the selection variables, a requirement is added on the output of a neural network [37], built using seven variables: the τ ± candidate masses and decay times, a charged track isolation variable for the pions, a neutral isolation variable for the B candidate, and one variable from the analytic reconstruction method, introduced in Ref. [36].The classifier is trained on simulated B → τ + τ − decays, representing the signal, and data events from the signal-depleted region.
In order to determine the signal yield, a binned maximum likelihood fit is performed on the output of a second neural network (NN), built with 29 variables and using the same training samples.The NN inputs include the eight variables from the analytic reconstruction method listed in Ref. [36], further isolation variables, as well as kinematic and geometrical variables.The NN output is transformed to obtain a flat distribution for the signal over the range [0.0, 1.0], while the background peaks towards zero.
Varying the two-pion invariant mass sector boundaries, the signal region is optimised for the B 0 s → τ + τ − branching fraction limit using pseudoexperiments.The boundaries are set to 615 and 935 MeV/c 2 .The overall efficiency of the selection, determined using simulated B 0 (s) → τ + τ − decays, is approximately 2.2(2.4)× 10 −5 , including the geometrical acceptance.Assuming the SM prediction, the number of B 0 s → τ + τ − decays expected in the signal region is 0.02.
After the selection, the signal, signal-depleted and control regions contain, respectively, 16%, 13% and 58% of the simulated signal decays.The corresponding fractions of selected candidates in data are 7%, 37% and 47%.Most signal decays fall into the control region, but the signal region, which contains about 14 700 candidates in data after the full selection, is more sensitive due to its lower background contamination.For the fit, ten equally sized bins of NN output in the range [0.0, 1.0] are considered, where the high NN region [0.7, 1.0] was not investigated until the fit strategy was fixed.The signal model is taken from the B 0 s → τ + τ − simulation, while the background model is taken from the data control region, correcting for the presence of expected signal events in this region.The fit model is given by where N SR sim/data (N CR sim/data ) is the NN output distribution in the signal (control) region from simulation/data, s is the signal yield in the signal region, f b is a scaling factor for the background template, and ε SR (ε CR ) is the signal efficiency in the signal (control) region.The quantities s and f b are left free in the fit.The corresponding normalised distributions N SR sim , N CR sim and N CR data are shown in Fig. 2. The agreement between the background NN output distributions in the control and signal regions has been tested in different samples: in the data for the backgrounddominated NN output bins [0.0, 0.7], in a generic bb simulated sample and in several specific simulated background modes (such as Within the statistical uncertainty, the distributions have been found to agree with each other in all cases.The background in the control region can therefore be used to characterise the background in the signal region.Differences between the shapes of the background distribution in the signal and control regions of the data are the main sources of systematic uncertainties on the background model.These uncertainties are taken into account by allowing each bin in the N CR data distribution to vary according to a Gaussian constraint.The width of this Gaussian function is determined by splitting the control region into two approximately equally populated samples and taking, for each bin, the maximum difference between the NN outputs of the two subregions and the unsplit sample.The splitting is constructed to have one region more signal-like and one region more background-like.
The signal can be mismodelled in the simulation.The B 0 → D − D + s decay is used to compare data and simulation for the variables used in the NN.Ten variables are found to be slightly mismodelled and their distributions are corrected by weighting.The difference in the shape of the NN output distribution compared to the original unweighted sample is used to derive the associated systematic uncertainty.The fit procedure is validated with pseudoexperiments and is found to be unbiased.Assuming no signal contribution, the expected statistical (systematic) uncertainty on the signal yield is +62 −40 ( +40 −42 ).The fit result on data is shown in Fig. 3 and gives a signal yield s = −23 +63  −53 (stat) +41 −40 (syst), where the split between the statistical and systematic uncertainties is based on the ratio expected from pseudoexperiments.The B 0 s → τ + τ − signal yield is converted into a branching fraction using B(B where τ + τ − and D − D + s are the combined efficiencies of trigger, reconstruction and selection of the signal and normalisation channels.The branching fractions used are B(B 0 → D − D + s ) = (7.5 ± 1.1) × 10 −3 [19], B(D − → K + π − π − ) = (9.46 ± 0.24)% [18] and B(D + s → K − K + π + ) = (5.45± 0.17)% [18], and f s /f d = 0.259 ± 0.015 [38] is the ratio of B 0 s to B 0 production fractions.The efficiencies are determined using simulation, applying correction factors derived from data.The B 0 →D − D + s yield, N obs , is obtained from a fit to the mass distribution, which has four contributions: the B 0 →D − D + s component, modelled by a Hypatia function [39], a combinatorial background component, described by an exponential function, and two partially reconstructed backgrounds, B 0 → D * − D + s and B 0 → D − D * + s , modelled as in Ref. [40].The resulting fit is shown in Fig. 4 and gives a yield of N obs (including the fit uncertainty) is 1.7%.Corrections determined from J/ψ → µ + µ − and D 0 → K − π + data control samples are applied for the tracking, PID and the hadronic hardware trigger efficiencies.The relative uncertainty on α s due to selection efficiencies is 2.9%, taking into account both the limited size of the simulated samples and the systematic uncertainties.The normalisation factor is found to be α s = (4.07± 0.70) × 10 −5 .
The shapes of the NN output distributions and the selection efficiencies depend on the parametrisation used in the simulation to model the τ − → π − π + π − ν τ decay.The result obtained with the Tauola BaBar-tune model is therefore compared to available alternatives [41], which are based on CLEO data for the τ − → π − π 0 π 0 ν τ decay [42].The selection efficiency for these alternative models can be up to 20% higher, due to different structures in the two-pion invariant mass, resulting in lower limits.Dependence of the NN signal output distribution on the τ -decay model is found to be negligible.Since the alternative models are based on a different τ decay, the BaBar-tune model is chosen as default and no systematic uncertainty is assigned.

A Supplemental material
In Section A.1, details about the analytic reconstruction method are given.Sections A.2 and A.3 contain additional results related to the B 0 s → τ + τ − and B 0 → τ + τ − decay channels.

A.1 Reconstruction method
A method to perform an analytic reconstruction of the B → τ + τ − decay chain is described in the following and in detail in Ref. [35].It combines geometrical information about the decay and sets mass constraints on the particles in the decay chain (B 0 s , τ and ν) to calculate the τ momenta analytically.In these calculations, Lorentz invariance is kept manifest and the possible values for the two τ momenta are found as analytic solutions of a system of two coupled equations of second degree in two unknowns.The only remaining degree of freedom is a single Lorentz scalar, introduced below as the angle θ, measuring the asymmetry of the decay triangle in the decay-time space of the two τ leptons.
In the following, the unknown momenta of the τ ± leptons, the primary parameters of interest, are labelled by the four-vectors p µ ± ; the B 0 s → τ + τ − decay plane is defined by the B production vertex, i.e. the PV, and the two τ decay vertices; the three-vectors pointing from the PV to the τ ± decay vertices are labelled w ± ; the time intervals between the B production and the τ ± decays, w 0 ± , which cannot be measured, act as the temporal counterpart to w ± .Together, they make the four-vectors the momenta p µ ± can now be obtained from the coupled set of equations by imposing momentum conservation throughout the decay chain.Here is given in terms of the B and τ masses, m i , and decay times, t i .Using the on-shell and flight-direction constraints, it is possible to rewrite Eq. ( 4) in terms of a single unknown, chosen to be the rotation angle, θ, that diagonalises the matrix H. Thanks to this transformation, solving Eq. ( 4) becomes equivalent to finding the roots of a fourth-order polynomial where explicit expressions for the coefficients a i (θ) in terms of θ and measurable quantities are given in Appendix D of Ref. [35].Even though it is possible, in principle, to exactly determine the angle θ, a different approach has been used, because of the high complexity of the trigonometric equations involved.The value of the angle θ is in fact approximated in the calculation of the complex solutions ξ 1 , . . ., ξ 4 .Three approximations have been considered in Ref. [35].They are 1.θ = θ = π/4, representing the case where both τ leptons have the same decay time.
These approximations, together with the finite detector resolution, prevent having real solutions for a substantial fraction of the signal events.Nonetheless, quantities appearing in intermediate calculations, though not having immediate physical meaning, have been found useful to discriminate between signal and background.The most powerful of these variables are exploited in the two neural networks that are used in the candidate selection.The variable is used by both the first and second NN, while the seven variables • Im[p + (θ = θ , ξ 1 )p − (θ = θ , ξ 1 )], • Im[p + (θ = θ , ξ 3 )p − (θ = θ , ξ 3 )], • Re[p + (θ = π/4, ξ 1 )p − (θ = π/4, ξ 1 )], • Re[ξ 1 (θ = θ * )], • θ, A.2 Additional B 0 s fit result Figure 5 shows the fit result using only the background model.A likelihood-ratio test is performed comparing the nominal fit with the background-only alternative.The pvalue of the likelihood-ratio test is 0.06, and the associated z-score is 1.60, showing that the data are consistent with the background-only hypothesis.Figure 6 shows the profile likelihood of the nominal fit.Figure 7 shows the expected and observed CL s values as a function of the branching fraction.The expected limit for the B 0 s mode is B(B 0 s → τ + τ − ) < 5.7 (7.4) × 10 −3 at 90 (95)% CL.

A.3 Additional B 0 fit result
The NN output distributions for simulated B 0 → τ + τ − decays in the signal and control regions are shown in Fig. 8.The fit result, assuming no contribution from B 0 s → τ + τ − decays, is shown in Fig. 9, and Fig. 10 shows the expected and observed CL s values as a function of the branching fraction.The expected limit for the B 0 mode is B(B 0 → τ + τ − ) < 1.7 (2.1) × 10 −3 at 90 (95)% CL.

Figure 1 :
Figure 1: Two-dimensional distribution of the invariant masses m π + π − of the two oppositely charged two-pion combinations for simulated B 0 s → τ + τ − candidates.The distribution is symmetric by construction.The vertical and horizontal lines illustrate the sector boundaries.

Figure 2 :
Figure 2: (left) Normalised NN output distribution in the signal ( N SR sim ) and control ( N CR sim ) region for B 0 s → τ + τ − simulated events.(right) Normalised NN output distribution in the data control region N CR data .The uncertainties reflect the statistics of the (simulated) data.

Figure 3 :
Figure 3: Distribution of the NN output in the signal region N SR data (black points), with the total fit result (blue line) and the background component (green line).The fitted B 0 s → τ + τ − signal component is negative and is therefore shown multiplied by −1 (red line).For each bin of the signal and background component the combined statistical and systematic uncertainty on the template is shown as a light-coloured band.The difference between data and fit divided by its uncertainty (pull) is shown underneath.

D
− D + s = 10 629 ± 114, where the uncertainty is statistical.Uncertainties on α s arise from the B 0 → D − D + s fit model, the finite size of the simulated samples, the uncertainty from the corrections to the simulation and external inputs.The latter contribution, which includes the branching fractions and hadronisation fractions in Eq. (2), is dominant, giving a relative uncertainty of 17% on α s .The B 0 → D − D + s fit model is varied using the sum of two Gaussian functions with a common mean and power-law tails instead of the Hypatia function for the signal, a second-order Chebychev polynomial instead of an exponential function for the combinatorial background, and adding two other background components from B 0 s → D − D * + s and B 0 → a 1 (1260) − D * + s decays.The change in signal yield compared to the nominal fit is taken as a systematic uncertainty, adding the contributions from the four variations in quadrature.The overall relative uncertainty on α s due to N obs D − D + s

Figure 4 :
Figure 4: Invariant mass distribution of the reconstructed B 0 → D − D + s candidates in data (black points), together with the total fit result (blue line) used to determine the B 0 → D − D + s

Figure 5 : 3 LHCbFigure 6 :
Figure 5: Distribution of the NN output in the signal region N SRdata (black points), with the total fit result (blue line), and the background component (green line).Shown is the fit using the "background only" model.For each bin of the background component the combined statistical and systematic uncertainty is shown as a light-coloured band.The difference between data and fit divided by its uncertainty (pull) is shown underneath.

Figure 7 : 1 Fraction
Figure 7: The p-value derived with the CL s method as a function of B(B 0s → τ + τ − ).Expected (observed) values are shown by a dashed (full) black line.The green (yellow) band covers the regions of 68% and 95% confidence for the expected limit.The red horizontal line corresponds to the limit at 95% CL.

Figure 8 :
Figure 8: Normalised NN output distribution in the signal ( N SR sim ) and control ( N CR sim ) region for B 0 → τ + τ − simulated events.

Figure 9 :
Figure 9: Fit results for B 0 → τ + τ − .Distribution N SRdata (black points), overlaid with the total fit result (blue line), and background component (green line), assuming all signal events originate from B 0 → τ + τ − decays.The B 0 → τ + τ − signal component is also shown (red line), multiplied by −1.For each bin of the signal and background component the combined statistical and systematic uncertainty is shown as a light-coloured band.The difference between data and fit divided by its uncertainty (pull) is shown underneath.

Figure 10 :
Figure10: The p-value derived with the CL s method as a function of B(B 0 → τ + τ − ).Expected (observed) values are shown by a dashed (full) black line.The green (yellow) band covers the regions of 68% and 95% confidence for the expected limit.The red horizontal line corresponds to the limit at 95% CL.