Search for the lepton-flavour violating decays $B^0_s \rightarrow e^{\pm}\mu^{\mp}$ and $B^0 \rightarrow e^{\pm} \mu^{\mp}$

A search for the lepton-flavour violating decays $B^0_s \to e^{\pm} \mu^{\mp}$ and $B^0 \to e^{\pm} \mu^{\mp}$ is performed with a data sample, corresponding to an integrated luminosity of 1.0 fb$^{-1}$ of $pp$ collisions at $\sqrt{s} = 7$ TeV, collected by the LHCb experiment. The observed number of $B^0_s \to e^{\pm} \mu^{\mp}$ and $B^0 \to e^{\pm} \mu^{\mp}$ candidates is consistent with background expectations. Upper limits on the branching fractions of both decays are determined to be $BR(B^0_s \to e^{\pm} \mu^{\mp})<1.1 \,(1.4) \times 10^{-8}$ and $BR (B^0 \to e^{\pm} \mu^{\mp})<2.8 \,(3.7) \times 10^{-9}$ at 90% (95%) confidence level (C.L.). These limits are a factor of twenty lower than those set by previous experiments. Lower bounds on the Pati-Salam leptoquark masses are also calculated, $M_{\rm LQ} (B^0_s \to e^{\pm} \mu^{\mp})>107$ TeV/c$^2$ and $M_{\rm LQ} (B^0 \to e^{\pm} \mu^{\mp})>126$ TeV/c$^2$ at 95% C.L., and are a factor of two higher than the previous bounds.

Rare decays that are forbidden in the Standard Model (SM) probe potential contributions from new processes and particles at a scale beyond the reach of direct searches. The decays B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ and their charged conjugate processes 1 are forbidden within the SM, in which lepton flavour is conserved. These decays are allowed in some scenarios beyond the SM that include models with heavy singlet Dirac neutrinos [1], supersymmetric models [2] and the Pati-Salam model [3]. The latter predicts a new interaction to mediate transitions between leptons and quarks via exchange of spin−1 gauge bosons, called Pati-Salam leptoquarks (LQ), that carry both colour and lepton quantum numbers.
Current limits from ATLAS [4,5,6] and CMS [7,8,9] on the masses of first, second or third generation leptoquarks are in the range [0.4, 0.9] TeV/c 2 , depending on the value of the couplings and the decay channel. These leptoquarks arise from a coupling between a quark and lepton of the same generation. The decays B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ can be mediated by other leptoquarks which couple leptons and quarks that are not necessarily from the same generation [10,11], such as when the τ lepton couples to a first or second quark generation.
The trigger [21] consists of a hardware stage (L0), based on information from the calorimeter and muon systems, followed by a software stage (HLT) that applies a full event reconstruction, and is split into two stages called HLT1 and HLT2. Candidate B 0 (s) → e ± µ ∓ decays considered in this analysis must satisfy a hardware decision that requires the presence of a muon candidate with transverse momentum p T > 1.5 GeV/c. All tracks considered in the HLT1 are required to have p T > 0.5 GeV/c. The muon track of the B 0 (s) → e ± µ ∓ candidates is required to have p T > 1.0 GeV/c and impact parameter, IP > 0.1 mm. The HLT2 consists of exclusive, cut-based triggers for B 0 (s) two-body decays, and inclusive multivariate [21,22] b-hadron triggers.
The B 0 → K + π − decay is used as the normalization channel and B 0 (s) → h + h − (h ( ) = K, π) decays are used as a control channel, since both have the same event topology as the signal. The B 0 → K + π − yield is computed from the yield of B 0 (s) → h + h − decays, and the fraction of B 0 → K + π − in the B 0 (s) → h + h − sample, as described in Ref. [23]. In order to minimize the bias introduced by the trigger requirements, only B 0 (s) → h + h − candidates that are triggered independently of the presence of either of the two signal hadrons at L0 and HLT1 are considered.
The B 0 (s) → e ± µ ∓ candidates that pass the trigger selection criteria are further required to have well identified electron and muon [24] candidates. The measured momenta of the electrons are corrected to account for loss of momentum by bremsstrahlung in the detector using the photon energy deposition in the electromagnetic calorimeter [25]. The signal candidates are required to be displaced with respect to any pp collision vertex (PV), and form a secondary vertex (SV) with χ 2 per degree of freedom smaller than 9 and separated from the PV in the downstream direction by a flight distance significance greater than 15.
Only B 0 (s) candidates with an impact parameter χ 2 (χ 2 IP ) less than 25 are considered. The χ 2 IP of a B 0 (s) candidate is defined as the difference between the χ 2 of the PV reconstructed with and without the considered candidate. When more than one PV is reconstructed, that giving the smallest χ 2 IP for the B 0 (s) candidate is chosen. Only B 0 (s) candidates with invariant mass in the range [4.9, 5.9] GeV/c 2 are kept for further analysis. The selection criteria for the B 0 (s) → h + h − and B 0 → K + π − candidates are identical to those of the signal, apart from those used for particle identification.
A two-stage multivariate selection based on boosted decision trees (BDT) [26,27] is applied to the B 0 (s) → e ± µ ∓ candidates following the same strategy as Ref. [28]. The two multivariate discriminants are trained using simulated samples, B 0 s → e ± µ ∓ for signal and bb → l ± l ∓ X for background (where l ( ) can either be a µ or an e and X is any other set of particles), which is dominated by simultaneous semileptonic decays of both b and b hadrons within the same event.
The requirement on the first multivariate discriminant [28] removes 75 % of the background while retaining 93 % of signal, as determined from simulation using half of the available samples to train and the other half to evaluate the efficiencies. The same selection is applied to the B 0 → K + π − normalization channel and the efficiencies of this requirement for the signal and normalization channel are equal within 1.2 %, as determined from simulation.
The surviving background mainly comprises random combinations of electrons and muons from semileptonic bb → e ± µ ∓ X decays. In total 5766 electronmuon pairs pass the trigger, the offline selection and the first multivariate discriminant requirements. The selected candidates are classified in a binned twodimensional space formed by the electron-muon invariant mass and the output of a second BDT, for which nine variables are employed [28]. The BDT output is independent of the invariant mass for signal inside the search window. The output is transformed such that the signal is approximately uniformly distributed between zero and one, while the background peaks at zero.
The probability for a signal event to have a given BDT value is obtained from data using the B 0 (s) → h + h − sample [29,30]. Simulated samples of B 0 (s) → e ± µ ∓ and B 0 (s) → h + h − decays have been used to check that the distributions of the variables entering in the BDT that do not depend on the bremsstrahlung radiation are in good agreement. Corrections to the BDT shape due to the presence of the radiation emitted by the electron of the B 0 (s) → e ± µ ∓ decays have been evaluated using simulation. The number of B 0 (s) → h + h − signal events in each BDT bin is determined by fitting the h + h − invariant mass distribution. The systematic uncertainty on the signal BDT probability distribution function is taken to be the maximum spread in the fractions of yields going into each bin, obtained by fitting the same B 0 (s) → h + h − dataset with different signal and background fit models. Corrections are applied to the BDT shape in order to take into account the effect of the different trigger requirements used for the signal and the B 0 The invariant mass line shape of the signal events is described by a Crystal Ball function (CB) [31] with two tails, left and right, defined by two parameters each. The values of the parameters depend on the momentum resolution, the momentum scale and the amount of bremsstrahlung radiation recovered.
The signal shape parameters are obtained from simulation, but need to be reweighted to account for their dependency on the event multiplicity, which affects the amount of bremsstrahlung radiation recovered and differs between data and simulation. We use the number of hits in the scintillating pad detector (N SPD ) as a measure of the event multiplicity. The distribution of N SPD for B 0 (s) → e ± µ ∓ signal candidates is obtained from a B + → J/ψ(µ + µ − )K + data sample, which is selected with the same trigger conditions as the signal, ensuring a similar distribution of N SPD . The signal mass shape parameters are determined by reweight- This reweighting technique is used also for a J/ψ → e + e − simulated sample and the reweighted parameters are then compared with those obtained with a J/ψ → e + e − sample in data. The difference between the mean values of the J/ψ → e + e − mass in data and simulation (+0.16%) is applied as a systematic shift to the peak values of the B 0 → e ± µ ∓ and B 0 s → e ± µ ∓ invariant mass in simulation. A systematic uncertainty is added to the B 0 (s) → e ± µ ∓ mass parameters when the differences in the values of the other mass parameters for the J/ψ → e + e − sample in data and SPDreweighted simulation are larger than their statistical uncertainties.
The signal region, defined by the invariant mass window [5.1, 5.5] GeV/c 2 , retains (85.0 ± 0.1 stat ± 5.0 syst )% and (82.0 ± 0.1 stat ± 5.0 syst )% of the B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ signal decays, respectively. The systematic uncertainties on these fractions are evaluated with pseudo-experiments that fluctuate each parameter of the mass lineshape according to its uncertainty. The width of the corresponding fraction distribution is taken as the systematic uncertainty.
The B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ yields are translated into branching fractions according to where N norm = 10 120 ± 920 is the number of signal events in the normalization channel and is determined from the total yield of the B 0 (s) → h + h − channel and the fraction of B 0 → K + π − events in the inclusive sample. The systematic uncertainty is comparable to the statistical one and is dominated by the maximum spread in the yield obtained by fitting the same B 0 (s) → h + h − dataset with different fit models [29,30]. The branching fraction of the normalization channel is B norm = (1.94 ± 0.06) × 10 −5 [32] and N B 0 (s) →e ± µ ∓ is the number of observed signal events. The factors f d and f s indicate the probabilities that a b quark fragments into a B 0 or B 0 s meson, respectively. We use f s /f d = 0.256 ± 0.020 measured in pp collision data at √ s = 7 TeV [33]. The measured dependence of f s /f d on the B meson p T [33] is found to be negligible for this analysis.
The efficiency sig(norm) for the signal (normalization) channel is the product of the reconstruction efficiency of the final state particles including the geometric detector acceptance, the selection efficiency and the trigger efficiency. The ratios of acceptance, reconstruction and selection efficiencies are computed with simulation. A systematic uncertainty is assigned to these ratios, to take into account the difference between the tracking efficiencies measured in data and predicted in simulation. Reweighting techniques are used to correct distributions in the simulation that do not match those from data, in particular for those variables that depend on N SPD . The trigger efficiency of L0 and HLT1 on signal decays is evaluated using data, while the HLT2 efficiency is evaluated in simulation after validation with control samples. The electron and muon identification efficiencies are evaluated from data using the B + → J/ψ(µ + µ − )K + and B + → J/ψ(e + e − )K + control samples. The two normalization factors α B 0 s and α B 0 are determined to be α B 0 s = (1.1 ± 0.2) × 10 −9 and α B 0 = (2.8 ± 0.5) × 10 −10 .
The BDT range is divided into eight bins with boundaries at 0.0, 0.25, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0. The number of expected combinatorial background events in each BDT bin and the invariant mass signal region is determined from data by fitting to an exponential function events in the mass sidebands, defined by [4.9, 5.0] GeV/c 2 and [5.5, 5.9] GeV/c 2 . The invariant mass distributions of the selected candidates in BDT bins and the binned BDT distributions for the signals and the combinatorial background samples are available in Supplemental Material [34].
In the exponential function both the slope and the normalization are allowed to vary. The systematic un-certainty on the estimated number of combinatorial background events in the signal regions is determined by fluctuating the number of events observed in the sidebands according to a Poisson distribution, and by varying the exponential slope according to its uncertainty. As a cross-check, two other models, the sum of two exponential functions and a single exponential fitted to the right sideband only, have been used and provide consistent background estimates inside the signal region.
The low-mass sideband and the signal region are potentially polluted by exclusive backgrounds. The background from B + c → J/ψ(µ + µ − )e + ν e and B + c → J/ψ(e + e − )µ + ν µ decays is evaluated assuming the branching fraction value from Ref. [35]. The de- b → pl − ν l and B + → π + l + l − (where l ± = e ± or µ ± ) are potential backgrounds if the hadrons are misidentified as electrons or muons. The B 0 → π − l + ν l and B 0 (s) → h + h − branching fractions are taken from Ref. [32]. The B + → π + l + l − branching fraction is taken from Ref. [36]. The theoretical estimates of the Λ 0 b → pl − ν l and B 0 s → K − l + ν l branching fractions are taken from Refs. [37] and [38], respectively. We use the Λ 0 b fragmentation fraction f Λ 0 b measured by LHCb [39] and account for its p T dependence.
The mass and BDT distributions of these background modes are evaluated from simulated samples, using the probabilities of misidentifying kaon, pion and proton as muon or electron as functions of momenta and transverse momenta, which are determined from D * + → D 0 (→ K − π + )π + and Λ → pπ − data samples. The yield of the B 0 (s) → h + h − → e + µ − peaking background in each BDT bin is obtained by multiplying the B 0 (s) → h + h − yields obtained by fitting the invariant mass distribution of an inclusive B 0 (s) → h + h − sample in BDT bins [29,30] with the probabilities of misidentifying kaon, pion and proton as muon or electron as functions of momenta and transverse momenta, as determined from control samples. The mass lineshape of the B 0 (s) → h + h − → e + µ − peaking background is obtained from a simulated sample of doubly-misidentified B 0 (s) → h + h − events. Apart from B 0 (s) → h + h − , all background modes are normalized relative to the B + → J/ψ(µ + µ − )K + decay. We assume f u = f d where f u is the B + fragmentation fraction.
The Λ 0 b → pl − ν l and the B + c → J/ψ(µ + µ − )e + ν e and B + c → J/ψ(e + e − )µ + ν µ modes are the dominant exclusive modes in the range BDT> 0.5, where the combinatorial background is reduced by a factor ∼ 500 according to simulation. These decay modes have an invariant mass distribution that is compatible with an exponential in the region [4.9-5.9] GeV/c 2 , and hence are taken into account by the exponential fit to the mass sidebands.
For each BDT bin we count the number of candidates observed in the signal region, and compare to the expected number of signal and background events.
The systematic uncertainties in the background and signal predictions in each bin are computed by varying the normalization factor, and the mass and BDT shapes within their Gaussian uncertainties.
The results for the B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ decays are summarized in Table 1. In the high BDT range, the observed number of candidates is in agreement with the number of expected exclusive backgrounds in the signal region. The compatibility of the observed distribution of events with that expected for a given branching fraction hypothesis is computed with the CL s method [40].
The expected and observed CL s values are shown in Fig. 1 for the B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ channels, as a function of the assumed branching fraction. The expected and measured limits for B 0 s → e ± µ ∓ and B 0 → e ± µ ∓ at 90 % and 95 % C.L. are shown in Table 2. Note that since the same events are used to set limits for both B 0 s and B 0 decays, the results are strongly correlated. The inclusion of systematic uncertainties increases the expected B 0 → e ± µ ∓ and B 0 s → e ± µ ∓ upper limits by ∼ 20%. The systematic uncertainties are dominated by the uncertainty in the interpolation of the background yields inside the signal region. The observed limits are ∼ 1 σ below the expectation due to the lower than expected numbers of observed events in the fourth and last BDT bins.
In the framework of the Pati-Salam model, the relation linking the B 0 (s) → e ± µ ∓ branching fractions and the leptoquark mass (M LQ ) [10] is

23
. The B 0 and B 0 s masses, m B 0 and m B 0 s , and the average lifetimes, τ B 0 and τ B 0 s , are taken from Ref. [32]. The factors F B 0 = 0.190 ± 0.004 GeV and F B 0 s = 0.227 ± 0.004 GeV are the decay constants of the B 0 and B 0 s mesons [41], and m b and m t are the bottom and top quark masses [32], respectively, computed in the MS scheme [42]. The value of α s at an arbitrary scale M LQ is determined using the software package rundec [43].
Using the limits on the branching fractions shown in Table 2, we find the following lower bounds for the leptoquark masses if the leptoquark links the τ lepton to the first and second quark generation, M LQ (B 0 s → e ± µ ∓ ) > 107 (101) TeV/c 2 and M LQ (B 0 → e ± µ ∓ ) > 135 (126) TeV/c 2 at 90 (95) % C.L., respectively. When the parameters entering in Eq. 2 are fluctuated within ±1 σ, the limits on the leptoquark masses change by ∼ ±1 TeV.
These are a factor of two higher than the previous bounds. Table 1: Expected background (bkg) from the fit to the data sidebands, and expected B 0 (s) → h + h − → e + µ − events, compared to the number of observed events in the mass signal region, in bins of BDT response.