Search for $B^+ \to \mu^+\, \nu_\mu$ and $B^+ \to \mu^+\, N$ with inclusive tagging

We report the result for a search for the leptonic decay of $B^+ \to \mu^+ \, \nu_{\mu}$ using the full Belle data set of 711 fb${}^{-1}$ of integrated luminosity at the $\Upsilon(4S)$ resonance. In the Standard Model leptonic $B$-meson decays are helicity and CKM suppressed. To maximize sensitivity an inclusive tagging approach is used to reconstruct the second $B$ meson produced in the collision. The directional information from this second $B$ meson is used to boost the observed $\mu$ into the signal $B$ meson rest-frame, in which the $\mu$ has a monochromatic momentum spectrum. Though its momentum is smeared by the experimental resolution, this technique improves the analysis sensitivity considerably. Analyzing the $\mu$ momentum spectrum in this frame we find $\mathcal{B}(B^+ \to \mu^+ \, \nu_\mu) = \left( 5.3 \pm 2.0 \pm 0.9 \right) \times 10^{-7}$ with a one-sided significance of 2.8 standard deviations over the background-only hypothesis. This translates to a frequentist upper limit of $\mathcal{B}(B^+ \to \mu^+ \, \nu_{\mu})<8.6 \times 10^{-7}$ at 90% CL. The experimental spectrum is then used to search for a massive sterile neutrino, $B^+ \to \mu^+ \, N$, but no evidence is observed for a sterile neutrino with a mass in a range of 0 - 1.5 GeV. The determined $B^+ \to \mu^+ \, \nu_{\mu}$ branching fraction limit is further used to constrain the mass and coupling space of the type II and type III two-Higgs-doublet models.

We report the result for a search for the leptonic decay of B + → µ + ν µ using the full Belle data set of 711 fb −1 of integrated luminosity at the Υ(4S) resonance. In the Standard Model leptonic B-meson decays are helicity and CKM suppressed. To maximize sensitivity an inclusive tagging approach is used to reconstruct the second B meson produced in the collision. The directional information from this second B meson is used to boost the observed µ into the signal B meson rest-frame, in which the µ has a monochromatic momentum spectrum. Though its momentum is smeared by the experimental resolution, this technique improves the analysis sensitivity considerably. Analyzing the µ momentum spectrum in this frame we find B(B + → µ + ν µ ) = (5.3 ± 2.0 ± 0.9) × 10 −7 with a one-sided significance of 2.8 standard deviations over the background-only hypothesis. This translates to a frequentist upper limit of B(B + → µ + ν µ ) < 8.6 × 10 −7 at 90% CL. The experimental spectrum is then used to search for a massive sterile neutrino, B + → µ + N , but no evidence is observed for a sterile neutrino with a mass in a range of 0 -1.5 GeV. The determined B + → µ + ν µ branching fraction limit is further used to constrain the mass and coupling space of the type II and type III two-Higgs-doublet models.

I. INTRODUCTION
Precision measurements of leptonic decays of B mesons offer a unique tool to test the validity of the Standard Model of particle physics (SM). Produced by the annihilation of theb-u quark-pair and the subsequent emission of a virtual W + -boson decaying into a antilepton and neutrino, this process is both Cabibbo-Kobayashi-Maskawa (CKM) and helicity suppressed in the SM. The branching fraction of the B + → + ν [1] process is given by (1) with G F denoting Fermi's constant, m B and m the B meson and lepton masses, respectively, and |V ub | the relevant CKM matrix element of the process. Further, τ B denotes the B meson lifetime and the decay constant f B parametrizes the b-u annihilation process, with A µ =bγ µ γ 5 u the corresponding axial-vector current and p µ the B meson four-momentum. The value of f B has to be determined using non-perturbative methods, such as lattice QCD [2] or QCD sum-rule calculations [3,4]. In this paper an improved search for B + → µ + ν µ using the full Belle data set is presented. Using the results of f B = 184 ± 4 MeV [2] and either inclusive or exclusive world averages for |V ub | [5] one finds an expected SM branching fraction of B(B + → µ + ν µ ) = (4.3 ± 0.8) × 10 −7 or B(B + → µ + ν µ ) = (3.8 ± 0.4) × 10 −7 , respectively. This implies an expected total of approximately 300 signal events in the entirety of the Belle data set of 711 fb −1 of integrated luminosity recorded at the Υ(4S) resonance. Thus it is imperative to maximize the overall selection efficiency, which rules out the use of exclusive tagging algorithms, as even advanced machine learning based implementations such as Ref. [6] only achieve efficiencies of a few percent. Events containing a high momentum muon candidate are identified as potential signal events, and the additional charged particles and neutral energy depositions in the rest of the event (ROE) are used to reconstruct the second B meson produced in the collision process. With such an inclusive reconstruction one reduces the background due to non-resonant e + e − → qq (q = u, d , s, c) continuum processes, and, after a dedicated calibration, it is possible to deduce the direction of the signal B meson. This is used to carry out the search in the signal B rest frame, in which the B + → µ + ν µ decay produces a muon with a monochromatic momentum of p B µ = 2.64 GeV. The experimental resolution on the boost vector reconstructed from ROE information broadens this signal signature. The use of this frame, which enhances the expected sensitivity of the search, is the main improvement over the preceding analysis, published in Ref. [7]. Further, the modeling of the crucial b → u ν semileptonic and continuum backgrounds has been improved with respect to the preceding analysis. In Ref. [7] a 90% confidence interval of [2.9, 10.7] × 10 −7 for the B + → µ + ν µ branching fraction was determined, while the most stringent 90% upper limit for this quantity that has been determined is 1 × 10 −6 [8]. In the presence of new physics interactions or particles, the CKM and helicity suppression of the B + → µ + ν µ decay can be lifted: the presence of, for instance, a charged Higgs boson, favored in many supersymmetric extensions of the SM, could strongly enhance the observed B + → + ν branching fractions. Leptoquarks could have a similar effect. Another interesting exotic particle whose existence can be investigated with this decay are sterile neutrinos. This hypothetical particle acts as a singlet under the fundamental symmetry group of the SM, i.e. they carry no color charge, no weak isospin, nor weak hypercharge quantum numbers. Further, sterile neutrinos do not couple to the gauge bosons of the SM, but their existence could explain, for instance, the dark matter content of the universe [9] or the smallness of the neutrino mass terms [10]. The only possibility for a sterile neutrino N to occur in a B + → µ + N final state is due to the existence of a non-SM mediator. Further, the mass of the sterile neutrino has to be m N < 5.17 GeV = m B − m µ and in the present analysis we are able to probe a mass range of m N ∈ [0, 1.5) GeV. In Fig. 1 the SM and a selection of beyond the SM (BSM) processes are shown.
The rest of this paper is organized as follows: Section II summarizes the used data set, simulated samples and reconstruction steps. Section III outlines the inclusive tag reconstruction and calibration of its direction. In addition, the employed background suppression strategies and the used categorization are summarized. In Section IV the validation of the inclusive tag reconstruction and calibration using B + →D 0 π + decays is described. Section V introduces the statistical methods used to determine the B + → µ + ν µ signal yield. In Section VI systematic uncertainties of the measurement are discussed and Section VII documents sideband studies to validate the modeling of the crucial b → u ν semileptonic and continuum backgrounds. Section VIII presents the main findings of the paper. Finally, Section IX contains a summary and our conclusions.

II. DATA SET AND SIMULATED SAMPLES
We analyze the full Belle data set of (772 ± 10) × 10 6 B meson pairs, produced at the KEKB accelerator complex [11] with a center-of-mass energy (c.m.) of √ s = 10.58 GeV at the Υ(4S) resonance. In addition, we use 79 fb −1 of collisions recorded 60 MeV below the Υ(4S) resonance peak to derive corrections and carry out crosschecks. The Belle detector is a large-solid-angle magnetic spectrometer that consists of a silicon vertex detector (SVD), a 50-layer central drift chamber (CDC), an array of aerogel thresholdČerenkov counters (ACC), a barrel-like arrangement of time-of-flight scintillation counters (TOF), and an electromagnetic calorimeter comprised of CsI(Tl) crystals (ECL) located inside a superconducting solenoid coil that provides a 1.5 T magnetic field. An iron flux return located outside of the coil is instrumented to de- tect K 0 L mesons and to identify muons (KLM). A more detailed description of the detector, its layout and performance can be found in Ref. [12] and in references therein.
Charged tracks are identified as electron or muon candidates by combining the information of multiple subdetectors into a lepton identification likelihood ratio, L LID . For electrons the identifying features are the ratio of the energy deposition in the ECL with respect to the reconstructed track momentum, the energy loss in the CDC, the shower shape in the ECL, the quality of the geometrical matching of the track to the shower position in the ECL, and the photon yield in the ACC [13]. Muon candidates are identified from charged track trajectories extrapolated to the outer detector. The identifying features are the difference between expected and measured penetration depth as well as the transverse deviation of KLM hits from the extrapolated trajectory [14]. Charged tracks are identified as pions or kaons using a likelihood classifier which combines information from the CDC, ACC, and TOF subdetectors. In order to avoid the difficulties understanding the efficiencies of reconstructing K 0 L mesons, they are not explicitly reconstructed in what follows.
Photons are identified as energy depositions in the ECL without an associated track. Only photons with an energy deposition of E γ > 100 MeV, 150 MeV, and 50 MeV in the forward endcap, backward endcap and barrel part of the calorimeter, respectively, are considered.
We carry out the entire analysis in the Belle II analysis software framework [15]: to this end the recorded Belle collision data and simulated Monte Carlo (MC) samples were converted using the software described in Ref. [16]. MC samples of B meson decays and continuum processes are simulated using the EvtGen generator [17]. The used sample sizes correspond to approximately ten and six times the Belle collision data for B meson and continuum decays, respectively. The interactions of particles traversing the detector are simulated using Geant3 [18]. Electromagnetic final-state radiation (FSR) is simulated using the PHOTOS [19] package. The efficiencies in the MC are corrected using data-driven methods.
Signal B + → µ + ν µ and B + → µ + N decays are simulated as two-body decay of a scalar initial-state meson to a lepton and a massless antineutrino. The effect of the non-zero sterile neutrino mass is incorporated by adjusting the kinematics of the simulated events.
The most important background processes are semileptonic b → u ν decays and continuum processes, which both produce high-momentum muons in a momentum range similar to the B + → µ + ν µ process. Charmless semileptonic decays are produced as a mixture of specific exclusive modes and non-resonant contributions: Semileptonic B → π + ν decays are simulated using the BCL form factor parametrization [20] with central values and uncertainties from the global fit carried out by Ref. [21]. The processes of B → ρ + ν and B → ω + ν are modeled using the BCL form factor parametrization. We fit the measurements of Refs. [22][23][24] in combination with the light-cone sum rule predictions of Ref. [25] to determine a set of central values and uncertainties. The subdominant processes of B → η + ν and B → η + ν are modeled using the ISGW2 model [26]. In addition to these narrow resonances, we produce non-resonant b → u ν decays with at least two pions in the final state using the DFN model [27]. In this model, the triple differential rate is regarded as a function of the four-momentum transfer squared (q 2 ), the lepton energy (E B ), and the hadronic invariant mass squared (m 2 X ) at next-to-leading order precision in the strong coupling constant α s . The triple differential rate is convolved with a non-perturbative shape function using an ad-hoc exponential model. The free parameters in this model are the b quark mass in the 1S scheme, m 1S b = (4.69 ± 0.04) GeV and a non-perturbative parameter a = 1.9±0.5. The values of these parameters were determined in Ref. [21] from a fit to b → c ν information. The non-perturbative parameter a is related to the average momentum squared of the b quark inside the B meson and controls the second moment of the shape function. It is defined as a = 3Λ 2 −λ 1 −1 with the binding energy Λ = m B − m 1S b and the hadronic matrix element expectation value λ 1 . Hadronization of parton-level DFN predictions for the b → u ν process is accomplished using the JETSET algorithm [28] to produce two or more final state mesons. The inclusive and exclusive b → u ν predictions are combined using a socalled 'hybrid' approach, which is a method originally suggested by Ref. [29]: to this end we combine both predictions such that the partial branching fractions in the triple differential rate of the inclusive (∆B incl ijk ) and combined exclusive (∆B excl ijk ) predictions reproduce the inclusive values. This is achieved by assigning weights to the inclusive contributions w ijk such that with i, j, k denoting the corresponding bin in the three dimensions of q 2 , E B , and m X : To study the model dependence of the DFN shape function and possible effects of next-to-next-to-leading order corrections in α s , we also determine weights using the BLNP model of Ref. [30]. The modeling of simulated continuum background processes is corrected using a data-driven method, which was first proposed in Ref. [31]: a boosted decision tree (BDT) is trained to distinguish between simulated continuum events and the recorded off-resonance data sample. This allows the BDT to learn differences between both samples, and a correction weight, w = p/ (1 − p), accounting for differences in both samples can be derived directly from the classifier output p. As input for the BDT we use the same variables used in the continuum suppression approach (which is further detailed in Section III) and, additionally, the signal-side muon momentum in the signal B meson frame.
The semileptonic background from b → c ν decays is dominated by B → D + ν and B → D * + ν decays. The B → D + ν form factors are modeled using the BGL form factors [32] with central values and uncertainties taken from the fit in Ref. [33]. For B → D * + ν we use the BGL implementation proposed by Refs. [34,35] with central values and uncertainties from the fit of the preliminary measurement of Ref. [36]. The measurement is insensitive to the precise details of the modeling of b → c ν involving higher charm resonances.
For the contributions of B + → µ + ν µ γ we use the recent experimental bounds of Ref. [37]. In this process, structure-dependent corrections, which are suppressed by the electromagnetic coupling constant α em , lift the helicity suppression of the B + → µ + ν µ decay. We simulate this process using the calculation of Ref. [38] and only allow daughter photons with E γ > 300 MeV, to avoid overlap with the FSR corrections simulated by PHOTOS as corrections to the B + → µ + ν µ final state. In the following, we treat these two processes separately.
The small amount of background from rare b → s/d processes is dominated by B + → K 0 L π + decays. Subdominant contributions are given by the decays B + → K + π 0 and B 0 → ρ + π − . We adjust those branching fractions to the latest averages of Ref. [5]. Table I summarizes the branching fractions used for all important background processes.

III. ANALYSIS STRATEGY, INCLUSIVE TAG RECONSTRUCTION AND CALIBRATION
We select BB candidate events by requiring at least three charged particles to be reconstructed and a significant fraction of the c.m. energy to be deposited in the ECL. We first reconstruct the signal side: a muon candidate with a momentum of p * µ > 2.2 GeV in the c.m. frame of the colliding e + e − -pair. The candidate is required to have a distance of closest approach to the nominal interaction point transverse to and along the beam axis of dr < 0.5 cm and |dz| < 2 cm, respectively. This initial selection results in a signal-side efficiency of ≈ 82.2%. After this the remaining charged tracks and neutral depositions are used to reconstruct the ROE to allow us to boost this signal muon candidate into the rest frame of the signal-side B meson. A looser selection on the ROE tracks is imposed, dr < 10 cm and |dz| < 20 cm, to also include charged particle candidates which are displaced from the interaction region. All ROE charged particles are treated as pions and no further particle identification is performed. Track candidates with a transverse momentum of p T < 275 MeV do not leave the CDC, but curl back into the detector. To avoid double counting of those tracks, we check if such are compatible with another track. If the track parameters indicate that this is the case, we veto the lower momentum track. When we combine the momentum information with ROE photon candidates (reconstructed as described in Section II) we determine the three-momentum (p lab tag ) and energy (E lab tag ) of the tag-side B meson in the laboratory frame as Here p lab i and E lab j denote the three-momentum of tracks and photons in the ROE. We proceed by boosting the tagside four-vector into the c.m. frame of the e + e − -collision. Due to the two-body nature of the Υ(4S) → BB decay, we have precise knowledge of the magnitude of tag-and signal-side B meson in this frame: |p * B | = 330 MeV. We thus correct after the boost the energy component of the tag-side four-vector to be exactly keeping the direction of the three-momentum unchanged. This improves the resolution with respect to using the boosted absolute three-momentum p * tag . Due to the asymmetric beam energies of the colliding e + e − -pair, all produced B meson decay products are boosted in the positive z direction in the laboratory frame. Thus it is more likely that charged and neutral particles escape the Belle detector acceptance in the forward region and bias the inclusive tag reconstruction. This bias degrades the resolution of the reconstructed z component of the p * tag momentum vector. The resolution is significantly improved by applying a calibration function derived from simulated e + e − → Υ(4S) → BB decays, where one B decays into a µν µ -pair. The goal of this function is to map the reconstructed mean momentum z component, p * tag z , to the mean of the simulated true distribution. The functional dependence between the reconstructed and true momentum z component is shown in Fig. 2. In addition, an over- all correction factor ζ is applied to the calibrated threemomentum, chosen such that the difference between the corrected and the simulated three-momentum becomes minimal. The corrected tag-side z and transverse momentum components are then with f the calibration function. The absolute difference between corrected and simulated three-momentum is found to be minimal for ζ = 0.58. Using the calibrated tag-side B meson three-momentum p * tag,cal , we boost the signal-side muon candidate into the signal-side B meson rest frame using Figure 3 compares the muon momentum spectrum for signal B + → µ + ν µ decays in the e + e − c.m. frame with the obtained resolution in the B rest frame (further denoted as p B µ ) using the calibrated momentum vector. Carrying out the boost into the approximated B meson rest frame improves the resolution of the reconstructed muon momentum by 7 % with respect to the resolution in the c.m. frame.
To reduce the sizable background from continuum processes, a multivariate classifier using an optimized implementation of gradient-BDTs [39] is used and trained to distinguish B + → µ + ν µ signal decays from continuum processes. The BDT exploits the fact that the event topology for non-resonant e + e − -collision processes differ significantly from the resonant e + e − → Υ(4S) → BB process. Event shape variables, such as the magnitude of the thrust of final-state particles from both B mesons, the reduced Fox-Wolfram moment R 2 , the modified Fox-Wolfram moments [40] and CLEO Cones [41], are highly discriminating. To these variables we add as additional inputs to the BDT the number of tracks in the ROE, the number of leptons (electrons or muons) in the ROE, the normalized beam-constrained mass of the tag-side B meson defined as and the normalized missing energy defined as with E * tag,reco denoting the energy from boosting the ROE four-vector from the laboratory into the c.m. frame. This list of variables and p B µ are used in the data-driven correction described in Section II to correct the simulated continuum events. We apply a loose set of ROE preselection cuts: only events with at least two tracks, fewer than three leptons, m tag bc > 0.96, ∆ E ∈ [−0.5, 0.1), and R 2 < 0.5 are further considered. Figure 4 compares the classifier output C out and p B µ distributions of the predicted simulated and corrected continuum contribution with recorded off-resonance collision events. Both variables show good agreement.
Using this classifier and the cosine of the angle between the calibrated signal B meson in the c.m. system and the muon in the B rest frame (cos Θ Bµ ) we define four mutually exclusive categories. The first two of these are signal enriched categories with C out ∈ [0.98, 1) and split with respect to their cos Θ Bµ values. For B + → µ + ν µ signal decays no preferred direction in cos Θ Bµ is expected. For the semileptonic and continuum background events, which pass the selection, the muons are emitted more frequently in the direction of the reconstructed B meson candidate. The second two categories have C out ∈ [0.93, 0.98), and they help separate b → u ν and continuum processes from B + → µ + ν µ signal decays. Table II summarizes the four categories. The chosen cut values were determined using a grid search and by fits to Asimov data sets (using the fit procedure further described in Section V).
In Section VII the signal-depleted region of C out ∈ [0.9, 0.93) is analyzed and simultaneous fits in two categories, cos Θ Bµ < 0 and cos Θ Bµ > 0, are carried out to validate the modeling of the important b → u ν background and to extract a value of the inclusive B(B → X u + ν) branching fraction. The selection efficiencies of B + → µ + ν µ signal and the background processes are summarized in Table III.
In order to validate the quality of the inclusive tag reconstruction and rule out possible biases introduced by the calibration method, we study the hadronic two-body decay of B + → D 0 π + with D 0 → K + π − . Due to the absence of any neutrino in this decay, we are able to fully reconstruct the B + four-vector and boost the prompt π + into the B + rest frame. Alternatively, we use the ROE, as outlined in the previous section, to reconstruct the very same information. Comparing the results from both allows us to determine if the calibration introduces potential biases and also to validate the signal resolution predicted in the simulation. In addition, we use this data set to test the validity of the continuum suppression and the data-driven continuum corrections  outlined in Section II.
We reconstruct the B + → D 0 π + with D 0 → K + π − using the same impact parameter requirements used in the B + → µ + ν µ analysis. For the prompt π + candidate we require a momentum of more than 2.1 GeV in the c.m. frame. For the D 0 decay product candidates a looser requirement is imposed, selecting charged tracks with a three-momentum of at least 0.3 GeV in the laboratory frame. To identify the kaon and pion candidates, we use the particle identification methods described in Section II. To further suppress contributions from background processes we require that the reconstructed D 0 mass is to be within 50 MeV of its expected value. Using the reconstructed four-vector of the B + → D 0 π + candidate we impose additional cuts to enhance the purity of the selected sample by using the beam-constrained mass and energy difference: Here p * B + and E * B + denote the reconstructed B + threemomentum and energy in the c.m. frame of the colliding e + e − -pair, respectively. The inclusive tag is reconstructed in the same way as outlined in the previous section and Fig. 5 shows the reconstructed prompt π + absolute three-momentum p B π after using the inclusive tag information to boost into the B + meson frame of rest. The simulated and reconstructed B + → D 0 π + decays show good agreement. Using the signal side information, we also reconstruct the residual ∆p B π = p B π − p B sig π , with p B sig π denoting the absolute three-momentum in the B + rest frame when reconstructed using the signal-side B + decay chain. The mean and variance of this distribution between simulated and reconstructed sample show good agreement and are compatible within their statistical uncertainties. We obtain a data-driven estimate for the inclusive tag resolution for p B π of 0.11 GeV. To validate the response of the multivariate classifier used to suppress continuum events, we remove the reconstructed D 0 decay products from the signal side to emulate the B + → µ + ν µ decay topology. Using the same BDT weights as for B + → µ + ν µ we then recalculate the classifier output C out . Its distribution is shown in Fig. 5 and simulated and reconstructed events are in good agreement. In Table IV we further compare the selection efficiency denoted as between simulated and reconstructed events for the four signal selection categories of the B + → µ + ν µ analysis. The efficiency is defined as the fraction of reconstructed candidates with C out > 0.93 or 0.98, respectively, with respect to the total number of reconstructed candidates. The efficiency from simulated and reconstructed events are in agreement within their statistical uncertainty and we do not assign additional corrections or uncertainties to the B + → µ + ν µ analysis in the following.

V. STATISTICAL ANALYSIS AND LIMIT SETTING PROCEDURE
In order to determine the B + → µ + ν µ or B + → µ + N signal yield and to constrain all background yields, we perform a simultaneous binned likelihood fit to the p B µ spectra using the four event categories defined in Section III. The total likelihood function we consider has the form FIG. 5. The p B π distribution and the residual ∆p B π for B + → D 0 π + decays with D 0 → K + π − are shown in the reconstructed rest frame of the B + meson. The p B π distribution is derived from the inclusive tag reconstruction method described in the text and the residual shows the difference with respect to using the full B + decay chain to determine the same information. In addition, the continuum classifier of simulated and reconstructed collision events are compared.
with the individual category likelihoods L c and nuisanceparameter (NP) constraints G k . The product in Eq. 11 runs over all categories c and fit components k, respectively. The role of the NP constraints is detailed in Section VI. Each category likelihood L c is defined as the product of individual Poisson distributions P, with n i denoting the number of observed data events and ν i the total number of expected events in a given bin i. We divide the muon momentum spectrum into 22 equal bins of 50 MeV, ranging over p B µ ∈ [2.2, 3.3) GeV, and the number of expected events in a given bin, ν i is estimated using simulated collision events. It is given by with η k the total number of events from a given process k with the fraction f ik of such events being reconstructed in the bin i. The likelihood Eq. 11 is numerically maximized to fit the value of four different components, η k using the observed events using the sequential least squares programming method implementation of Ref. [42].
2. Background b → u ν Events; simulated as described in Section II.
Two additional background components, B + → µ + ν µ γ and other rare b → s processes, are constrained in the fit to the measurement of Ref. [37] and world averages of Ref. [5]. Both mimic the signal shape and are allowed to vary in the fit within their corresponding experimental uncertainties. Further details on how this is implemented are found in Section VI. We construct confidence levels for the components using the profile likelihood ratio method. For a given component η k the ratio is where η k , η, θ are the values of the component of interest, the remaining components, and a vector of nuisance parameters that unconditionally maximize the likelihood function, whereas the remaining components η η k and nuisance parameters θ η k maximize the likelihood under the condition that the component of interest is kept fixed at a given value η k . In the asymptotic limit, the test statistic Eq. 14 can be used to construct approximate confidence intervals (CI) through with f χ 2 (x; 1 dof) denoting the χ 2 distribution with a single degree of freedom. In the absence of a significant signal, we determine Frequentist and Bayesian limits. For the Frequentist one-sided (positive) limit, we modify our test statistic according to Ref. [43,44] to to maximize our sensitivity. This test statistic is asymptotically distributed as and with an observed value q obs 0 we evaluate the (local) probability of an observed signal, p 0 , as For the Bayesian limit, we convert the likelihood Eq. 11 using a vector of observed event yields in the given bins of all categories n (denoted as L = L(n|η k ) in the following) into a probability density function F of the parameter of interest η k using a flat prior π(η k ) to exclude unphysical negative branching fractions. This L is numerically maximized for given values of the parameter of interest η k , by floating the other components and nuisance parameters. The probability density function F is then given by with the prior π(η k ) = constant for η k ≥ 0 and zero otherwise.
To quote the significance over the background-only hypothesis for the search for B + → µ + ν µ and B + → µ + N we adapt Eq. 16 and set η k = 0. For the search for a heavy sterile neutrino we do not account for the lookelsewhere effect.
We validate the fit procedure using ensembles of pseudoexperiments generated for different input branching fractions for B + → µ + ν µ and B + → µ + N decays and observe no biases, undercoverage or overcoverage of CI.
Using a SM branching fraction of B(B + → µ + ν µ ) = (4.3 ± 0.8) × 10 −7 , calculated assuming an average value of |V ub | = (3.94 ± 0.36)×10 −3 [5] we construct Asimov data sets for all four categories. These are used to determine the median expected significance of our analysis. We find a value of 2.4 +0.8 −0.9 standard deviations incorporating all systematic uncertainties and 2.6 +1.0 −0.9 standard deviations if we only consider statistical uncertainties. The quoted uncertainties on the median expected significance correspond to the 68% CL intervals.

VI. SYSTEMATIC UNCERTAINTIES
There are several systematic uncertainties that affect the search for B + → µ + ν µ and B + → µ + N . The most important uncertainty stems from the modeling of the dominant semileptonic b → u ν background decays. As we determine the overall normalization of these decays directly from the measured collision events, we only need to evaluate shape uncertainties. The most important here stem from the modeling of the B → π + ν , B → ρ + ν , and B → ω + ν form factors, the branching fractions for these processes, B → η + ν , B → η + ν and inclusive b → u ν decays. The uncertainty of the nonresonant b → u ν contributions in the hybrid model approach is estimated by changing the underlying model from DFN to BLNP. In addition, the uncertainty on the DFN parameters m 1S b and a are included in the shape uncertainty (see Section II). There is no sizable shape uncertainty contribution owing to either muon identification or track reconstruction. The second most important uncertainty for the reported results is from the shape of the continuum template: the off-resonance data sample, which was used to correct the simulated continuum events, introduces additional statistical uncertainties. We evaluate the size of these using a bootstrapping procedure. The b → c ν background near the kinematic endpoint for such decays is dominated by B → D + ν and B → D * + ν decays. We evaluate the uncertainties in the used BGL form factors and their branching fractions for both channels. For the B + → µ + ν µ signal, and the fixed backgrounds from B + → µ + ν µ γ and rare b → s processes, we also evaluate the impact on the efficiency of the lepton-identification uncertainties, the number of produced B meson pairs in the Belle data set, and the overall tracking efficiency uncertainty. In addition, we propagate the experimental uncertainty on the used B + → µ + ν µ γ branching fraction. The rare b → s/d template is dominated by B + → K 0 L π + events (which make up about 32% of all selected events) and we assign an uncertainty on the measured branching fraction and the two next-most occurring decay channels, B + → K + π 0 (5%) and B 0 → ρ + π − (4%), in the template. The statistical uncertainty on the generated MC samples is also evaluated and taken into account. A full listing of the systematic uncertainties is found in Table V.
The effect of systematic uncertainties is directly incorporated into the likelihood function. For this we introduce a vector of NPs, θ k , for each fit template k. Each vector element represents one bin of the fitted p B µ spectrum in all four categories. The NPs are constrained in the likelihood Eq. 11 using multivariate Gaussian distributions G k = G k (0; θ k , Σ k ), with Σ k denoting the systematic covariance matrix for a given template k. The systematic covariance is constructed from the sum over all possible uncertainty sources affecting a template k, i.e.
with Σ ks the covariance matrix of error source s which depends on an uncertainty vector σ ks . The vector ele- Fractional uncertainty Source of uncertainty  ments of σ ks represent the absolute error in bins of p B µ of fit template k across the four event categories. We treat uncertainties from the same error source either as fully correlated, or, for MC or other statistical uncertainties as uncorrelated, such that Σ ks = σ ks ⊗ σ ks or Σ ks = Diag σ ks 2 . The impact of nuisance parameters is included in Eq. 13 as follows. First, the fractions f ik for all templates are rewritten as to take into account shape uncertainties. These uncertainties are listed as 'Additive uncertainties' in Table V.
Here θ ik represents the NP vector element of bin i and η MC ik the expected number of events in the same bin for event type k as estimated from the simulation. Note that this notation absorbs the size of the absolute error into the definition of the NP. Second, we include for the B + → µ + ν µ signal template and fixed background tem-plates overall efficiency and luminosity related uncertainties: this is achieved by rewriting the relevant fractions as with θ ks the NP parameterizing the uncertainty in question. The uncertainty sources treated this way include the overall lepton identification and track reconstruction efficiency uncertainty and the uncertainty on the number of B meson pairs produced in the full Belle data set and are labeled as 'Multiplicative uncertainties' in Table V.
For the fixed background templates the corresponding uncertainties from branching fractions are also included this way.

VII. b → u ν AND OFF-RESONANCE CONTROL REGION
To test the simulation of the crucial semileptonic b → u ν background, we construct a signal-depleted region with moderate continuum contamination. This is achieved by selecting events with continuum suppression classifier values of C out ∈ [0.9, 0.93). In this sample, the region of high muon momentum p B µ is used to test the validity of the continuum description and the region with a muon momentum between 2.2 and 2.6 GeV is dominated by semileptonic b → u ν and b → c ν decays. To also test the modeling of both backgrounds with respect to the employed signal categorization exploiting the angle between the muon and the signal B meson, we further split the selected events using cos Θ Bµ > 0 and cos Θ Bµ < 0. The full likelihood fit procedure including all systematic uncertainties detailed in Sections V and VI is then carried out. Figure 6 depicts the fit result: the individual contributions are shown as histograms and the recorded collision events are displayed as data points. The size of the systematic uncertainties is shown on the histograms as a hatched band. In the fit the signal B + → µ + ν µ yield was fixed to the SM expectation and in both categories we expect about 15 B + → µ + ν µ events. Both the b → u ν and b → c ν , and continuum dominated regions are described well by the fit templates. Assuming that for most bins the statistical uncertainty is approximately Gaussian, we calculate a χ 2 of 30.4 over 41 degrees of freedom by comparing predicted and observed yields in each bin and by taking into account the full systematic uncertainties. This approximation is justified for most of the p B µ region, but breaks down for the high momentum bins due to low statistics. The value still gives an indication that the fit model is able to describe the observed data well. We also carry out a fit in which the B + → µ + ν µ signal template is allowed to float: we determine a value −37 ± 61 events, which is compatible with the SM expectation.
For the inclusive b → u ν branching fraction in which the signal template is kept fixed at its SM expectation, where the uncertainty corresponds to the statistical error. The central value is compatible with the world average of Ref. [5], B(B → X u + ν) = (2.13 ± 0.31) × 10 −3 . Note that Ref. [5] inflated the quoted uncertainty to account for incompatibilities between the measurements used in the average.
We also apply the signal continuum classifier selection of C out ∈ [0.93, 1) on the recorded off-resonance data. With these events we carry out a two-component fit, determining the yields of B + → µ + ν µ signal and continuum events. This allows us to determine whether the classifier selection could cause a sculpting of the background shape, which in turn would result in a spurious signal. The low number of events passing the selection does not allow further categorization of the events using angular information as only 39 off-resonance events pass the selection. We fit 37 ± 10 background events and 2 ± 7 signal events.

VIII. RESULTS
In Fig. 7 the muon momentum spectrum in the B rest frame p B µ for the four signal categories is shown. The selected data events were used to maximize the likelihood Eq. 11: in total 4 × 22 bins with 4 × 132 NPs parameterizing systematic uncertainties are determined. In Appendix A a full breakdown of the NP pulls is given. The recorded collision data are shown as data points and the fitted B + → µ + ν µ signal and background components are displayed as colored histograms. The size of the systematic uncertainties is shown on the histograms as a hatched band. We observe for the B + → µ + ν µ branching fraction a value of with the first uncertainty denoting the statistical error and the second is from systematics. Fig. 8 shows the profile likelihood ratio Λ(ν sig ) (cf. Eq. 14). Assuming that all bins are described with approximately Gaussian uncertainty and including systematics with their full covariance, we calculate a χ 2 value of 58.8 with 84 degrees of freedom using the predicted and observed bin values. The observed significance over the background-only hypothesis using the one-sided test statistics Eq. 16 is 2.8 standard deviations. This is in agreement with the median SM expectation of 2.4 +0.8 −0.9 standard deviations, cf. Section V.
From the observed branching fraction we determine in combination with the B meson decay constant f B a value for the CKM matrix element |V ub |. Using f B = 184 ± 4 MeV [2] we find where the first uncertainty is the statistical error, the second from systematics and the third from theory. This value is compatible with both exclusive and inclusive measurements of |V ub | [5]. Due to the low significance of the observed B + → µ + ν µ signal, we calculate Bayesian and Frequentist upper limits of the branching fraction. We convert the likelihood into a Bayesian probability density function (PDF) using the procedure detailed in Section V and Eq. 19: Figure 9 shows the one-dimensional PDF, which was obtained using a flat prior in the partial branching fraction. The resulting Bayesian upper limit for B + → µ + ν µ at 90% confidence level (CL) is The Frequentist upper limit is determined using fits to ensembles of Asimov data sets with NPs shifted to the observed best fit values. Figure 9 shows the corresponding Frequentist likelihood, for convenience also converted into a PDF (blue dotted line) and the resulting upper limit at 90% CL is   The observed branching fraction is used to constrain the allowed parameter space of the two-Higgs-doublet model (2HDM) of type II and type III. In these models the presence of charged Higgs bosons as a new mediator with specific couplings would modify the observed branching fraction, cf. Fig. 1. The effect of the charged Higgs boson in the type II model is included in the expected B + → µ + ν µ branching fraction by modifying Eq. 1 according to Ref. [45] to with B SM denoting the SM branching fraction, tan β being the ratio of the vacuum expectation values of the two Higgs fields in the model, and m H + the mass of the charged Higgs boson. The type III model further generalizes the couplings to [46,47]  using the observed branching fraction Eq. 24 and by constructing a χ 2 test. For the SM branching fraction prediction we use B SM = (4.3 ± 0.8) × 10 −7 calculated assuming an average value of |V ub | = (3.94 ± 0.36)) × 10 −3 from Ref. [5]. Due to the explicit lepton mass dependence in the type III model, the constructed bounds on C µ L/R are more precise than any existing limits on C τ L/R based on results from studying B + → τ + ν τ decays.
To search for sterile neutrinos in B + → µ + N we fix the B + → µ + ν µ contribution to its SM value (B SM ) and search simultaneously in the four categories for an excess in the p B µ distributions. From the observed yields and our simulated predictions we calculate local p 0 values using the test statistic Eq. 16. The observed p 0 values are shown in Fig. 11 for sterile neutrino masses ranging from 0 -1.5 GeV, and no significant excess over the background-only SM hypothesis is observed. The largest deviation is seen at a mass of m N = 1 GeV with a significance of 1.8 σ. The result does not account for any corrections for the look-elsewhere effect. We also calculate the Bayesian upper limit on the branching fraction from the extracted signal yield of the B + → µ + N process with the B + → µ + ν µ contribution fixed to its SM value. The upper limit as a function of the sterile neutrino mass is also shown in Fig. 11. To compare the upper limit from the B + → µ + N process to previous searches [48 ? -54] for sterile neutrinos we calculate the excluded values of the coupling U µN 2 and the sterile neutrino mass m N using [55] If the SM process is accounted for, no significant excess is observed. The largest deviation from the background-only hypothesis is at m N = 1 GeV. No correction for the look-elsewhere effect is included. (top right) The Bayesian upper limit on the branching fraction as calculated from the sterile neutrino signal yield. The B + → µ + ν µ process is fixed to its SM expectation. (bottom) The excluded area in the coupling-mass plane from this search in comparison to previous searches for sterile neutrinos.

IX. SUMMARY AND CONCLUSIONS
In this paper results for the improved search of the B + → µ + ν µ and B + → µ + N processes using the full Belle data set and an inclusive tag approach are shown. The measurement supersedes the previous result of Ref. [7] as it has a higher sensitivity and a more accurate modeling of the crucial semileptonic b → u ν background. The analysis is carried out in the approximate B rest frame of the signal B + → µ + ν µ decay, reconstructed from the remaining charged and neutral particles of the collision event. These are combined and calibrated to reconstruct the second B meson produced in the collision. In combination with the known beam properties the four-momentum of the signal B meson is then reconstructed and used to boost the reconstructed signal muon in the reference frame, where the signal B meson is at rest. This results in a better signal resolution and improved sensitivity in contrast to carrying out the search in the c.m. frame of the colliding e + e − -pair. The analysis is carried out in four analysis categories using the continuum suppression classifier and angular information of the B meson and the muon. The branching fraction is determined using a binned maximum likelihood fit of the muon momentum spectrum. Shape and normalization uncertainties from the signal and background templates are directly incorporated into the likelihood. We report an observed branching fraction of B(B + → µ + ν µ ) = (5.3 ± 2.0 ± 0.9) × 10 −7 , with a significance of 2.8 standard deviations over the background-only hypothesis. We also quote the corresponding 90% upper limit using Bayesian and Frequentist approaches and use the observed branching fraction to set limits on type II and type III two-Higgs-doublet models. We find B(B + → µ + ν µ ) < 8.9 × 10 −7 and B(B + → µ + ν µ ) < 8.6×10 −7 at 90% CL for the Bayesian and Frequentist upper limits, respectively. The type III constraints are the most precise determined to date. In addition, we use the reconstructed muon spectrum to search for the presence of a sterile neutrino created through the process of B + → µ + N and via a new mediator particle. No significant excess is observed for masses in the probed range of m N ∈ [0, 1.5) GeV. The largest excess is seen at a sterile neutrino mass of 1 GeV with a local significance of 1.8 standard deviations.