Search for the baryon- and lepton-number violating decays $B^0\to p\mu^-$ and $B^0_s\to p\mu^-$

A search for the baryon- and lepton-number violating decays $B^0\to p\mu^-$ and $B^0_s\to p\mu^-$ is performed at the LHCb experiment using data collected in proton-proton collisions at $\sqrt{s}$ = 7, 8 and 13 TeV, corresponding to integrated luminosities of 1, 2 and 6 fb$^{-1}$, respectively. No significant signal for $B^0\to p\mu^-$ and $B^0_s\to p\mu^-$ decays is found and the upper limits on the branching fractions are determined to be ${\cal B}(B^0\to p\mu^-)<2.6\,(3.1)\times10^{-9}$ and ${\cal B}(B^0_s\to p\mu^-)<12.1\,(14.0)\times10^{-9}$, respectively, at 90$\%$ (95$\%$) confidence level. These are the first limits on these decays to date.


Introduction
Despite the tremendous success of the Standard Model (SM) of particle physics in the past few decades, there are still some important questions that are not answered. One of them is the observation of the matter-antimatter asymmetry in the universe, which brings a serious challenge to our understanding of nature. In 1967, Andrei Sakharov proposed three necessary conditions that must be satisfied to produce such a large matter-antimatter asymmetry, one of which is baryon number violation (BNV) [1]. Various extensions of the SM that include BNV processes, known as Grand Unified Theory (GUT) models, have been proposed, such as SU(5) [2], SO(10) [3], E 6 [4] and flipped SU (5) [5,6]. These GUT models usually require two hypothetical gauge bosons, X and Y , with electric charges of ± 4 3 e and ± 1 3 e, that couple quarks to leptons and lead to both BNV and lepton number violation (LNV).
Proton decay is a BNV process of the lightest baryon: none of its decay modes, although predicted by many GUT models, have been observed [7]. However, since proton decay only involves first-generation quarks, experimental searches for BNV decays of heavy-flavour hadrons are very important and represent additional probes for new physics effects. Various BNV processes have been searched for in τ , Λ, D, J/ψ, and B decays by the CLEO [8], CLAS [9], BESIII [10][11][12][13] and BABAR [14] experiments, but no evidence has been found so far. The large data samples accumulated by the LHCb experiment are expected to lead to the best sensitivity for investigating BNV decays of B mesons. The B 0 (s) → pℓ − decay modes, with possible hypothetical Feynman diagrams shown in Fig. 1, are relevant BNV and LNV processes that are forbidden in the SM with a prediction of B(b → uuℓ − ) < 2.4 × 10 −27 [15] derived from the constraint of proton stability.
In this paper, we present the first search for B 0 → pµ − and B 0 s → pµ − decays 1 using proton-proton (pp) collisions collected with the LHCb detector and corresponding to an integrated luminosity of 1 fb −1 at a center-of-mass energy √ s = 7 TeV, 2 fb −1 at √ s = 8 TeV and 6 fb −1 at √ s = 13 TeV. The first two data sets are referred to as Run 1 and the latter as Run 2. Two normalisation channels are used: the B 0 → K + π − decay, which has a similar topology to that of the signal, and the B + → J/ψ (µ + µ − )K + decay, due to its high abundance and the high purity that can be achieved at the LHCb experiment. The B 0 (s) → pµ − candidates with pµ − pair invariant mass m pµ − in the range [5067, 5667] MeV/c 2 are selected. To avoid potential biases, B 0 (s) → pµ − candidates in the signal region, m pµ − ∈ [5217, 5457] MeV/c 2 , were not examined until the selection and fitting procedure were finalised.

Detector and simulation
The LHCb detector [16,17] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The online event selection is performed by a trigger [18], which consists of a hardware stage, based on information from the muon and calorimeter systems, followed by a software stage that applies a full reconstruction of the event. At the hardware stage, the presence of a muon with high p T is required, while at the software level both tracks are used. A first software stage requires the presence of at least one track with high p T that is well separated from the PV. It is followed by a second stage that requires a two-track secondary vertex with significant displacement from any PV, and a multivariate algorithm [19,20] is used for the identification of secondary vertices consistent with the decay of a b hadron.
Simulated samples of signal and background channels are used to evaluate geometrical, reconstruction and selection efficiencies, to train multivariate classifiers and to determine the shapes of invariant mass distributions of both signal and background contributions. In the simulation, pp collisions are generated using Pythia [21] with a specific LHCb configuration [22]. Decays of hadronic particles are described by EvtGen [23], in which final-state radiation is generated using Photos [24]. The signal events are generated by assuming the B 0 (s) decays do not produce any preferred polarization of the proton. The interaction of the generated particles with the detector, and its response, are simulated using the Geant4 toolkit [25,26] as described in Ref. [27].

Selection
The B 0 (s) → pµ − candidates are reconstructed by combining two oppositely charged tracks with transverse momentum in the range 0.25 < p T < 40 GeV/c, momentum p < 500 GeV/c, as in Ref. [28]. The tracks are also required to have a good track fit quality, with a track fit χ 2 per degree of freedom smaller than three. Only candidate tracks with χ 2 IP > 25 for any PV are selected, where χ 2 IP is defined as the difference between the vertex-fit χ 2 of the PV formed with and without the particle in question. The distance of closest approach between the two tracks is required to be below 0.1 mm. The proton candidates are required to be in the geometric acceptance of the LHCb muon stations to mimic the selection of the muon pair of the normalisation channel B + → J/ψ (µ + µ − )K + . This requirement has a negligible effect on the signal efficiency. The two candidate tracks are required to form a secondary vertex with a vertex-fit χ 2 per degree of freedom (χ 2 vtx ) smaller than nine. Furthermore, the resulting B 0 (s) candidates must have a decay time less than nine times the B 0 s lifetime, χ 2 IP < 25 with respect to the PV for which the χ 2 IP is minimal and p T > 0.5 GeV/c, to suppress background from random combinations of tracks originating from the PV. Particles forming the B 0 (s) → pµ − candidates are required to be well identified as a proton and a muon [29], using information from the Cherenkov detectors, the calorimeters and the muon stations. The stringent requirements on the p and µ candidates retain more than 60% of the signal candidates and eliminate more than 99% of the background candidates from decays with misidentified particles, particularly the two-body peaking backgrounds, but have almost no discriminating power on the semileptonic decay Λ 0 b → pµ − ν µ , which has the same visible final state particles as the signal channels. This decay represents the main physics background in this analysis. The Λ 0 b → pXµ − ν µ decays with any additional unreconstructed particles (X) are expected to have negligible contribution in the fit region of pµ − mass.
For the normalization channels, the selection for B 0 → K + π − candidates is the same as for the B 0 (s) → pµ − channel, except for the particle identification (PID) criteria. Similarly, the B + → J/ψ (µ + µ − )K + candidate selection is also kept as similar as possible, applying the same selection used for the signal to the dimuon pair from the J/ψ decay. Additionally, loose quality requirements are applied on the J/ψ vertex, χ 2 vtx < 9. Finally, a 100 MeV/c 2 mass window around the known B + mass and a 60 MeV/c 2 mass window around the known J/ψ mass [30] are used.

MLP training and calibration
A Multilayer perceptron (MLP) [31] classifier implemented in the TMVA toolkit [32] is used to separate the B 0 (s) → pµ − signal from the combinatorial background, which arises from random combinations of tracks. The classifier is trained using a sample of simulated B 0 → pµ − events to describe the signal, and a data sample of same-sign pµ + candidates to describe the combinatorial background. The same-sign sample reproduces the distributions of the MLP input variables from the data sidebands (5067 < m pµ − < 5100 MeV/c 2 or 5500 < m pµ − < 5667 MeV/c 2 ). The following input variables are used: the minimum χ 2 IP of the two tracks in the final state and the χ 2 IP of the B 0 candidate, the vertex fit χ 2 of the B 0 decay and its displacement from the production vertex, the B 0 transverse momentum and its proper decay time, the distance of closest approach between the two tracks in the final state, the difference of pseudorapidity between the two final-state tracks, the angular difference between the direction of B 0 momentum and the direction defined by the secondary and primary vertices, and the angular difference between the direction of the µ momentum and the vector perpendicular to the B 0 momentum and the beam axis in the B 0 rest frame. Distributions of two classifier input variables, the vertex fit χ 2 of the B 0 decay and the minimum χ 2 IP of the two tracks in the final state, for the same-sign sample and the data sidebands in Run 2 are compared in Fig. 2.
The response of the MLP classifier is constructed to be uniform in the range [0,1] for signal after the full selection without PID requirements. For background, the MLP response peaks near zero. Its linear correlation with the pµ − pair mass is below 1%. The MLP response is divided into eight intervals with boundaries of 0.0, 0.25, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0. The MLP response is required to be greater than 0.25 in the final pµ − mass distribution fit.
Since the MLP classifier is trained using only kinematic information of a two-body B 0 decay, its response is calibrated using B 0 → K + π − decays. To avoid biases, B 0 → K + π − candidates are selected from candidates where the trigger decision does not depend on the B 0 → K + π − candidates themselves. The number of B 0 → K + π − candidates in each interval of the MLP response is determined by fitting the K + π − invariant mass distribution after the full selection. Furthermore, the candidates are corrected for PID efficiency and weighted to emulate the effect of the B 0 (s) → pµ − triggers. The distributions of the MLP response on B 0 → K + π − simulated samples and data samples show good agreement. The distribution of the MLP response from the B 0 → K + π − data samples is corrected by the ratio of the B 0 → pµ − and B 0 → K + π − MLP responses in simulated samples. The corrected MLP response is shown in Fig. 3 and regarded as the expected probability density function (PDF) of the MLP response for B 0 (s) → pµ − decays, due to negligible kinematic differences between the B 0 and B 0 s decays.

Normalisation
The B 0 (s) → pµ − yields are obtained from a fit to the pµ − invariant mass distribution and translated into branching fractions with the normalisation channel where B, ϵ, and N are the branching fraction, efficiency and yield of the corresponding channel and f sig(norm) indicates the fragmentation fraction of the relevant B meson. The parameter α sig is the single-event sensitivity. The B 0 → K + π − and B + → J/ψ (µ + µ − )K + decays are used as control channels. The B 0 → K + π − mode is used as a proxy to determine the MLP PDF for signal channels (Sec. 4) and the B + → J/ψ (µ + µ − )K + mode is used to extract the signal branching fraction. To validate the normalisation procedure, the ratio between the measured branching fractions of B 0 → K + π − and B + → J/ψ (µ + µ − )K + is determined as where ε and N are the selection efficiency and yield. Using the same fit models as in Ref. [33], the yields of the normalisation channels are obtained through fits to the mass distributions of the candidates separately for Run 1 and Run 2 datasets. The results of the fits are shown in Figs. 4 and 5.
The selection efficiency for signal and normalisation channels includes efficiencies due to detector acceptance, tracking, reconstruction, trigger, and PID. All parts of the efficiency are evaluated using simulation, in which trigger and PID efficiencies are corrected using data [34,35].
Calibration samples where the trigger decision is independent of the candidate decay products are used to study the trigger efficiency. From these samples, B + → J/ψ (µ + µ − )K +    candidates are used to estimate the trigger efficiency for muons as a function of the muon p T and IP. No proton trigger is used in this analysis. For the two normalisation channels, B + → J/ψ (µ + µ − )K + data samples are also used to determine the trigger efficiency map as a function of the B meson p T . The resulting trigger efficiency maps are then applied to the simulation to determine the integrated trigger efficiency for a specific channel.
Particle identification efficiencies are evaluated using high-purity control samples of each particle species obtained from data [35,36]. These control samples are obtained by means of kinematic requirements only, with muons obtained from J/ψ → µ + µ − and B + → J/ψK + decays, pions and kaons from D 0 → K − π + decays selected via D * + → D 0 π + decays, and protons from Λ → pπ − and Λ + c → pK − π + decays. The muon PID efficiencies are evaluated as a function of the muon p T and η. The PID efficiencies for different hadrons are evaluated as a function of p and η. The single-track efficiencies are then combined and averaged using the kinematic distributions of the corresponding simulated sample.
The ratios R norm between the measured branching fractions of B 0 → K + π − and B + → J/ψ (µ + µ − )K + in Eq. 2 are R norm = 0.329 ± 0.012 and R norm = 0.341 ± 0.010 for Run 1 and Run 2, respectively, where the uncertainties are statistical only and include the efficiency uncertainties from both channels. These values are in agreement with the ratio of the world averages of these branching fractions, R PDG = 0.326 ± 0.012 [7].

Background contributions
In addition to the combinatorial background, the signal region is also potentially contaminated by background contributions from exclusive decays where one or more of the final-state particles are misidentified or not reconstructed. The most dangerous of these backgrounds are hadronic X b → h + h ′− decays, such as: and partially reconstructed semileptonic decays: The partially reconstructed decays do not peak in the signal region, but are potentially abundant. The expected number of candidates from each possible background decay that pass the signal selection is evaluated using simulation. The candidates from B 0 (s) decays are normalised to the number of B + → J/ψ (µ + µ − )K + decays found in data as where N X is the expected number of candidates from the X decay that fall into the B 0 (s) → pµ − signal mass window; N B + →J/ψ (µ + µ − )K + is the yield of B + → J/ψ (µ + µ − )K + decays in the data; f q is the fragmentation fraction for a b quark to produce either a meson with secondary quark content q or a baryon; and B and ϵ are the branching fraction and selection efficiency for a particular channel, respectively. For Λ 0 b decays, the fragmentation fraction f Λ 0 b /f d depends on the p T of the Λ 0 b baryon [37,38]. The Λ 0 b → pK − decay is used as a normalisation channel to estimate the number of background Λ 0 b candidates, thereby removing a potential bias due to the correlation between f Λ 0 b /f d and p T . The selection criteria on Λ 0 b → pK − candidates are largely the same as for the B 0 → K + π − normalisation channel except for the p and K identification requirements.
The expected total number of X b → h + h ′− decays in the full MLP range is found to be negligible. The only background sources which are found to be relevant are the semileptonic decays Λ 0 b → pµ − ν µ , B 0 s → K − µ + ν µ and B 0 → π − µ + ν µ , with yields estimated to be roughly 8500 ± 2600, 50 ± 13 and 310 ± 16, respectively, for Run 1 and 21000 ± 6400, 64 ± 17 and 410 ± 23, respectively, for Run 2. The Run 2 to Run 1 yield ratios of B 0 s → K − µ + ν µ and B 0 → π − µ + ν µ are considerably smaller than that of Λ 0 b → pµ − ν µ , which is mainly due to improved discrimination of protons from pions and kaons in Run 2. These three decay modes are included in the final fit model.

Measurement of signal branching fractions
The branching fractions of the signal decays are determined using an unbinned extended maximum-likelihood fit to the pµ − mass distributions, performed simultaneously on all the subsets. The fit is performed in the mass window 5067 < m pµ − < 5667 MeV/c 2 . The full fit model in the data-taking period r and MLP interval i, PDF r,i Full (m) is: where P is the Poisson probability of measuring N , given the fitted yields. The signal model, denoted PDF sig , is described in detail below. Also PDF com,r is the PDF for combinatorial background, which is described with an exponential function with an independent shape parameter in each data-taking period and independent yield N com,r,i in each subset. PDF r ,i,j represents the PDF for the physical background with index j, which is modelled based on histograms determined from simulation and smoothed with a second order interpolation. N r,j and F r,i,j represent the expected yield and the yield fraction in each subset for physical backgrounds. PDF sig is a sum of a double-sided Crystal Ball (DSCB) function [39] and a Gaussian function. The signal shape parameters are obtained from simulation, with data-driven scale factors applied to the core resolution to correct for minor data-simulation discrepancies. For this purpose, since there is no appropriate control channel with a proton and a muon in the final state, Λ 0 b → pK − and J/ψ → µ + µ − decays are analysed comparing the invariant mass resolution in data and simulation. The results are then combined to reproduce the effect on an pµ − final state. Corrections to the mass resolution are of the order of 10%. The mass shape parameters are found to be independent of the particular MLP interval and only two models, for the Run 1 and Run 2 data samples, are used.
The yield in each subset, N sig,r,i , is expressed as N sig,r,i = B sig f sig ε PID,r,i ε others,r F sig,r,i N norm,r B norm f norm ε norm,r .
Here, B and f represent the branching fraction and fragmentation fraction of the signal channel or the normalisation channel. The quantities N norm,r and ε norm,r are the yield and efficiency of the normalisation mode B + → J/ψ (µ + µ − )K + . Furthermore, ε PID,r,i and ε others,r are the PID efficiency and the total efficiency without the PID requirements for signal channels. F r,i,j represents the fraction in each subset for signal channels (Fig. 3). The quantities B sig and N com,r,i are free to vary in the fit, while f sig , N norm,r , B norm , f norm , ε norm,r , N r,j , F r,i,j , ε PID,r,i , ε others,r , F sig,r,i and N norm,r are Gaussian-constrained, according to the expected value and uncertainty. The result of the fit in each subset is shown in Figs. 6 and 7. The resulting branching fractions are: where the first uncertainty is statistical and the second systematic. No significant excess of B 0 → pµ − or B 0 s → pµ − decays is observed and upper limits on the branching fractions are set using the CL s method [40] with a one-sided test statistic [41] as implemented in Refs. [42,43]. The one-sided test statistic for a given branching fraction value is defined as twice the negative logarithm of the profile likelihood ratio if it is larger than the measured branching fraction and zero otherwise. Its distribution is determined from pseudoexperiments, where nuisance parameters are set to their best fit values when generating pseudoexperiments. The central values of the Gaussian-constraints are independently varied within their uncertainties for each pseudoexperiment as described in Ref. [44]. With the inclusion of systematic effects as discussed below, the CL s curves as a function of the branching ratios are shown in Fig. 8, and the upper limits at 90 (95)% confidence level (CL) are reported in Table 1, where the observed upper limits are both above the expected ones. This is likely due to a fluctuation of the concentration of Λ 0 b → pµ − ν µ events around the known B 0 (s) masses in MLP regions with large MLP response values.  Several systematic uncertainties can affect the evaluation of the limit on the B 0 → pµ − and B 0 s → pµ − branching fractions through the normalisation formula (Eq. 1). The systematic uncertainties considered are related to the fraction of signal and physical backgrounds in each MLP interval; expected total yields of physical backgrounds; signal efficiency; mass resolution of signal shape; fragmentation fraction f sig /f norm (only applicable for the B 0 s mode); branching fraction, efficiency and yield of the normalisation channel B + → J/ψ (µ + µ − )K + . The systematic uncertainties on the efficiencies of the signal and normalisation channels include the uncertainties arising from modelling the dependencies of the trigger (PID) efficiency maps. The effect on the pµ − mass distribution and yield fraction from Λ 0 b → pµ − ν µ form factor uncertainty and different Λ 0 b → pµ − ν µ physics models have been studied and found to be negligible. No systematic uncertainty is assigned due to the assumptions of unpolarized final states. These systematic uncertainties are taken into account for the limit computation by constraining the respective parameters in the likelihood fit with a Gaussian distribution having the central value of the parameter as the mean and its uncertainty as the width. The overall impact on the limits is evaluated to be 4% on B 0 → pµ − and 11% on B 0 s → pµ − .

Summary
In summary, a search for the LNV and BNV decays B 0 → pµ − and B 0 s → pµ − is performed using the full Run 1 and Run 2 data samples of the LHCb experiment, using a sample of proton-proton collision data corresponding to a total integrated luminosity of 9 fb −1 . No excesses are observed for these two modes and upper limits on the branching fractions are set to be B(B 0 → pµ − ) < 2.6 (3.1) × 10 −9 and B(B 0 s → pµ − ) < 12.1 (14.0) × 10 −9 at 90% (95%) CL. These results represent the first upper limits on these decays to date.