Measurements of Partial Branching Fractions of Inclusive $B \to X_u \, \ell^+\, \nu_{\ell}$ Decays with Hadronic Tagging

We present measurements of partial branching fractions of inclusive semileptonic $B \to X_u \, \ell^+\, \nu_{\ell}$ decays using the full Belle data set of 711 fb$^{-1}$ of integrated luminosity at the $\Upsilon(4S)$ resonance and for $\ell = e, \mu$. Inclusive semileptonic $B \to X_u \, \ell^+\, \nu_{\ell}$ decays are CKM suppressed and measurements are complicated by the large background from CKM-favored $B \to X_c \, \ell^+\, \nu_{\ell}$ transitions, which have a similar signature. Using machine learning techniques, we reduce this and other backgrounds effectively, whilst retaining access to a large fraction of the $B \to X_u \, \ell^+\, \nu_{\ell}$ phase space and high signal efficiency. We measure partial branching fractions in three phase-space regions covering about $31\%$ to $86\%$ of the accessible $B \to X_u \, \ell^+\, \nu_{\ell}$ phase space. The most inclusive measurement corresponds to the phase space with lepton energies of $E_\ell^B>1 $ GeV, and we obtain $\Delta \mathcal{B}(B \to X_u \ell^+ \, \nu_\ell) = \left( 1.59 \pm 0.07 \pm 0.16 \right) \times 10^{-3}$ from a two-dimensional fit of the hadronic mass spectrum and the four-momentum-transfer squared distribution, with the uncertainties denoting the statistical and systematic error. We find $\left| V_{ub} \right| = \left( 4.10 \pm 0.09 \pm 0.22 \pm 0.15 \right) \times 10^{-3}$ from an average of four calculations for the partial decay rate with the third uncertainty denoting the average theory error. This value is higher but compatible with the determination from exclusive semileptonic decays within 1.3 standard deviations. In addition, we report charmless inclusive partial branching fractions separately for $B^+$ and $B^0$ mesons as well as for electron and muon final states. No isospin breaking or lepton flavor universality violating effects are observed.

We present measurements of partial branching fractions of inclusive semileptonic B → X u + ν decays using the full Belle data set of 711 fb −1 of integrated luminosity at the Υ(4S) resonance and for = e, µ.Inclusive semileptonic B → X u + ν decays are CKM suppressed and measurements are complicated by the large background from CKM-favored B → X c + ν transitions, which have a similar signature.Using machine learning techniques, we reduce this and other backgrounds effectively, whilst retaining access to a large fraction of the B → X u + ν phase space and high signal efficiency.We measure partial branching fractions in three phase-space regions covering about 31% to 86% of the accessible B → X u + ν phase space.The most inclusive measurement corresponds to the phase space with lepton energies of E B > 1 GeV, and we obtain ∆B(B → X u + ν ) = (1.59 ± 0.07 ± 0.16) × 10 −3 from a two-dimensional fit of the hadronic mass spectrum and the four-momentum-transfer squared distribution, with the uncertainties denoting the statistical and systematic error.We find |V ub | = (4.10 ± 0.09 ± 0.22 ± 0.15) × 10 −3 from an average of four calculations for the partial decay rate with the third uncertainty denoting the average theory error.This value is higher but compatible with the determination from exclusive semileptonic decays within 1.3 standard deviations.In addition, we report charmless inclusive partial branching fractions separately for B + and B 0 mesons as well as for electron and muon final states.No isospin breaking or lepton flavor universality violating effects are observed.

I. INTRODUCTION
Precision measurements of the absolute value of the Cabibbo-Kobayashi-Maskawa (CKM) matrix element V ub are important to challenge the Standard Model of particle physics (SM) [1,2].In the SM, the CKM matrix is a 3 × 3 unitary matrix and responsible for the known charge-parity (CP) violating effects in the quark sector [3].There are indications of CP violation in the neutrino sector [4], but it remains unclear if both sources of CP violation are sufficient to explain the matter dominance of today's universe.This motivates the search for new sources of CP-violating phenomena.If such exist in the form of heavy exotic particles that couple to quarks in some form, their presence might alter the properties of measurements constraining the unitarity of the CKM matrix [5].Precise measurements of |V ub | and the CKM angle γ = φ 3 are imperative to isolate such effects, as their measurements involve tree-level processes, which are expected to remain unaffected by new physics and thus provide an unbiased measure for the amount of CPV due to the Kobayashi-Maskawa (KM) mechanism [2] alone.* cao@physik.uni-bonn.de† florian.bernlochner@uni-bonn.de‡ now at Hiroshima University Charmless semileptonic decays of B mesons provide a clean avenue to measure |V ub |, as their decay rate is theoretically better understood than purely hadronic transitions and their decay signature is more accessible than leptonic B meson decays.The existing measurements either focus on exclusive final states, with B → π + ν [6] and the ratio of Λ b → p µ + ν µ and Λ b → Λ c µ + ν µ [7] providing the most precise measurements to date, and measurements reconstructing the B → X u + ν decay fully inclusively 1 .Central for both approaches are reliable predictions of the (partial) decay rates ∆Γ(B → X u + ν ) (omitting the CKM factor) from theory to convert measured (partial or full) branching fractions, ∆B(B → X u + ν ), into measurements of |V ub | via with τ B denoting the B meson lifetime.For exclusive measurements, the non-perturbative parts of the decay rates can be reliably predicted by lattice QCD [8] or lightcone sum rules [9] and constrained by the measurements 1 Charge conjugation is implied throughout this paper.In addition, B → X u + ν is defined as the average branching fraction of charged and neutral B meson decays and = e or µ.   of the decay dynamics.The determination of |V ub | using inclusive decays is very challenging due to the large background from the CKM-favored B → X c + ν process.Both processes have a very similar decay signature in the form of a high momentum lepton, a hadronic system, and missing energy from the neutrino that escapes detection.Figure 1 shows an illustration of both processes for a B 0 -meson decay.A clear separation of the processes is only possible in kinematic regions where B → X c + ν is kinematically forbidden.In these regions, however, non-perturbative shape functions enter the description of the decay dynamics, making predictions for the decay rates dependent on the precise modeling.These functions parametrize at leading order the Fermi motion of the b quark inside the B meson.Properties of the leadingorder Λ QCD /m b shape function can be determined using the photon energy spectrum of B → X s γ decays and moments of the lepton energy or hadronic invariant mass in semileptonic B decays [10][11][12], but the modeling of both the leading and subleading shape functions introduces large theory uncertainties on the decay rate.In the future, more model-independent approaches aim to directly measure the leading-order shape function [13,14].
As such methods are not yet realized, it is beneficial to extend the measurement region as much as possible into the B → X c + ν dominated phase space.This was done, e.g., by Refs.[15,16].This reduces the theory uncertainties on the predicted partial rates [17][18][19][20][21][22], although making the measurement more prone to systematic uncertainties.This strategy is also adopted in the measurement described in this paper.
The corresponding world averages of |V ub | from both exclusive and inclusive determinations are [6]: ub | = (3.67 ± 0.09 ± 0.12) |V incl.ub | = 4.32 ± 0.12 +0.12 −0.13 × 10 −3 . ( Here the uncertainties are experimental and from theory.Both world averages exhibit a disagreement of about 3 standard deviations between them.This disagreement is limiting the reach of present-day precision tests of the KM mechanism and searches for loop-level new physics, see e.g.Ref. [23] for a recent analysis.For a more complete review the interested reader is referred to Refs.[24,25].One important experimental method to extend the probed B → X u + ν phase space into regions dominated by B → X c + ν transitions is the full reconstruction of the second B meson of the e + e − → Υ(4S) → B B process.This process is referred to as "tagging" and allows for the reconstruction of the hadronic X system of the semileptonic process.In addition, the neutrino four-momentum can be reconstructed.Properties of both are instrumental to distinguish B → X u + ν and B → X c + ν processes.In this manuscript the reconstruction of the second B meson and the separation of B → X u + ν from B → X c + ν processes were carried out using machine learning approaches.Several neural networks were trained to identify correctly reconstructed tag-side B mesons.The distinguishing variables of the classification algorithm were carefully selected in order not to introduce a bias in the measured partial branching fractions.In addition, the modeling of backgrounds was validated in B → X c + ν enriched selections.We report the measurement of three partial branching fractions, covering 31% -86% of the accessible B → X u + ν phase space.The measurement of fully differential distributions, which allow one to determine the leading and subleading shape functions, is left for future work.
The main improvement over the previous Belle result of Ref. [16] lies in the adoption of a more efficient tagging algorithm for the reconstruction of the second B meson and the improvements of the B → X u + ν signal and B → X c + ν background descriptions.In addition, the full Belle data set of 711 fb −1 is analyzed and we avoid the direct use of kinematic properties of the candidate semileptonic decay in the background suppression.After the final selection we retain a factor of approximatively 1.8 times more signal events than the previous analysis with a ca.20% improved purity.
The remainder of this manuscript is organized as follows: Section II provides an overview of the data set and the simulated signal and background samples, that were used in the analysis.Section III details the analysis strategy and reconstruction of the hadronic X system of the semileptonic decay.Section IV introduces the fit procedure used to separate B → X u + ν signal from background contributions.Section V lists the systematic uncertainties affecting the measurements and Section VI summarizes sideband studies central to validate the modeling of the crucial B → X c + ν background processes.Finally, Section VII shows the selected signal events and compares them with the expectation from simulation.In Section VIII the measured partial branching fractions and subsequent values of |V ub | are discussed.Section IX presents our conclusions.

II. DATA SET AND SIMULATED SAMPLES
The analysis utilizes the full Belle data set of (772 ± 10) × 10 6 B meson pairs, which were produced at the KEKB accelerator complex [26] with a center-ofmass energy of √ s = 10.58GeV corresponding to the Υ(4S) resonance.In addition, 79 fb −1 of collision events recorded 60 MeV below the Υ(4S) resonance peak are used to derive corrections and for cross-checks.The Belle detector is a large-solid-angle magnetic spectrometer that consists of a silicon vertex detector, a 50layer central drift chamber (CDC), an array of aerogel threshold Cherenkov counters (ACC), a barrel-like arrangement of time-of-flight scintillation counters (TOF), and an electromagnetic calorimeter composed of CsI(Tl) crystals (ECL) located inside a superconducting solenoid coil that provides a 1.5 T magnetic field.An iron flux return located outside of the coil is instrumented to detect K 0 L mesons and to identify muons (KLM).A more detailed description of the detector and its layout and performance can be found in Ref. [27] and in references therein.
Charged tracks are identified as electron or muon candidates by combining the information of multiple subdetectors into a lepton identification likelihood ratio, L LID .For electrons, the most important identifying features are the ratio of the energy deposition in the ECL with respect to the reconstructed track momentum, the energy loss in the CDC, the shower shape in the ECL, the quality of the geometrical matching of the track to the shower position in the ECL, and the photon yield in the ACC [28].Muon candidates can be identified from charged track trajectories extrapolated to the outer detector.The most important identifying features are the difference between expected and measured penetration depth as well as the transverse deviation of KLM hits from the extrapolated trajectory [29].Charged tracks are identified as pions or kaons using a likelihood ratio L K/π ID = L K ID / (L K ID + L π ID ).The most important identifying features of the kaon (L K ID ) and pion (L π ID ) likelihoods for low momentum particles with transverse momentum below 1 GeV in the laboratory frame are the recorded energy loss by ionization, dE/dx, in the CDC, and the time of flight information from the TOF.Higher-momentum kaon and pion classification relies on the Cherenkov light recorded in the ACC.In order to avoid the difficulties in understanding the efficiencies of reconstructing K 0 L mesons, they are not explicitly reconstructed or used in this analysis.
Photons are identified as energy depositions in the ECL, vetoing clusters to which an associated track can be assigned.Only photons with an energy deposition of E γ > 100 MeV, 150 MeV, and 50 MeV in the forward endcap, backward endcap and barrel part of the calorimeter, respectively, are considered.We reconstruct π 0 candidates from photon candidates.The invariant mass is required to fall inside a window 2 of m γγ ∈ [0.12, 0.15] GeV, which corresponds to about 2.5 times the π 0 mass resolution.
Monte Carlo (MC) samples of B meson decays and continuum processes (e + e − → q q with q = u, d, s, c) 2 We use natural units: = c = 1.
are simulated using the EvtGen generator [30].These samples are used to evaluate reconstruction efficiencies and acceptance, and to estimate background contaminations.The sample sizes used correspond to approximately ten and five times, respectively, the Belle collision data for B meson and continuum decays.The interactions of particles traversing the detector are simulated using Geant3 [31].Electromagnetic final-state radiation is simulated using the PHOTOS [32] package for all charged final-state particles.The efficiencies in the MC are corrected using data-driven methods to account for, e.g., differences in identification and reconstruction efficiencies.
The most important background processes are semileptonic B → X c + ν decays and continuum processes, which both can produce high-momentum leptons in a momentum range similar to the B → X u + ν process.
The semileptonic background from B → X c + ν decays is dominated by B → D + ν and B → D * + ν decays.The B → D + ν decays are modeled using the BGL parametrization [33] with form factor central values and uncertainties taken from the fit in Ref. [34].For B → D * + ν we use the BGL implementation proposed by Refs.[35,36] with form factor central values and uncertainties from the fit to the measurement of Ref. [37].Both backgrounds are normalized to the average branching fraction of Ref. [6] assuming isospin symmetry.Semileptonic B → D * * + ν decays with D * * = {D * 0 , D * 1 , D 1 , D * 2 } denoting the four orbitally excited charmed mesons are modeled using the heavy-quark-symmetry-based form factors proposed in Ref. [38].We simulate all D * * decays using masses and widths from Ref. [39].For the branching fractions we adopt the values of Ref. [6] and correct them to account for missing isospin-conjugated and other established decay modes, following the prescription given in Ref. [38].To correct for the fact that the measurements were carried out in the D * * 0 → D ( * )+ π − decay modes, we account for the missing isospin modes with a factor of The measurements of the B → D * 2 ν in Ref. [6] are converted to only account for the D * 0 2 → D * − π + decay.To also account for D * 0 2 → D − π + contributions, we apply a factor of [39] The world average of B → D * 1 ν given in Ref. [6] combines measurements, which show poor agreement, and the resulting probability of the combination is below 0.01%.Notably, the measurement of Ref. [40] is in conflict with the measured branching fractions of Refs.[41,42] and with the expectation of B(B → D * 1 ν ) being of similar size than B(B → D 0 ν ) [43,44].We perform our own average excluding Ref. [40] and use ( The world average of B → D 1 ν does not include contributions from prompt three-body decays of D 1 → Dππ.We account for these using a factor [45] f We subtract the contribution of D 1 → Dππ from the measured non-resonant plus resonant B → Dππ ν branching fraction of Ref. [46].To account for missing isospin-conjugated modes of the three-hadron final states we adopt the prescription from Ref. [46], which calculates an average isospin correction factor of The uncertainty takes into account the full spread of final states (f 0 (500) → ππ or ρ → ππ result in f ππ = 2/3 and 1/3, respectively) and the non-resonant three-body decays (f ππ = 3/7).We further assume that

B(D
For the remaining B → D ( * ) π π + ν contributions we use the measured value of Ref. [46].The remaining "gap" between the sum of all considered exclusive modes and the inclusive B → X c + ν branching fraction (≈ 0.8 × 10 −2 or 7-8% of the total B → X c + ν branching fraction) is filled in equal parts with B → D η + ν and B → D * η + ν and we assume a 100% uncertainty on this contribution.We simulate B → D ( * ) π π + ν and B → D ( * ) η + ν final states assuming that they are produced by the decay of two broad resonant states D * * gap with masses and widths identical to D * 1 and D 0 .Although there is currently no experimental evidence for decays of charm 1P states into these final states or the existence of such an additional broad state (e.g. a 2S) in semileptonic transitions, this description provides a better kinematic description of the initial three-body decay, B → D * * gap ν , than e.g. a model based on the equidistribution of all final-state particles in phase space.For the form factors we adapt Ref. [38].
Semileptonic B → X u + ν decays are modeled as a mixture of specific exclusive modes and non-resonant contributions.We normalize their corresponding branching fractions to the world averages from Ref. [39]: semileptonic B → π + ν decays are simulated using the BCL parametrization [47] with form factor central values and uncertainties from the global fit carried out by Ref. [48].The processes of B → ρ + ν and B → ω + ν are modeled using the BCL form factor parametrization.We use the fit of Ref. [49], that combines the measurements of Refs.[50][51][52] with the light-cone sum rule predictions of Ref. [9] to determine a set of form factor central values and uncertainties.The processes of B → η + ν and B → η + ν are modeled using the LCSR calculation of Ref. [53].For the uncertainties we assume for these states that the pole-parameters α +/0 and the form factor normalization f + Bη (0) at maximum recoil can be treated as uncorrelated.In addition to these narrow resonances, we simulate non-resonant B → X u + ν decays with at least two pions in the final state following the DFN model [54].The triple differential rate of this model is a function of the four-momentum-transfer squared (q 2 ), the lepton energy (E B ) in the B restframe, and the hadronic invariant mass squared (M 2 X ) of the X u system at next-to-leading order precision in the strong coupling constant α s .This triple differential rate is convolved with a non-perturbative shape function using an ad-hoc exponential model.The free parameters of the model are the b quark mass in the Kagan-Neubert scheme [55], m KN b = (4.66 ± 0.04) GeV and a non-perturbative parameter a KN = 1.3 ± 0.5.The values of these parameters were determined in Ref. [56] from a fit to B → X c + ν and B → X s γ decay properties.At leading order, the non-perturbative parameter a KN is related to the average momentum squared of the b quark inside the B meson and determines the second moment of the shape function.It is defined as and the kinetic energy parameter λ 1 .The hadronization of the parton-level B → X u + ν DFN simulation is carried out using the JETSET algorithm [57], producing final states with two or more mesons.The inclusive and exclusive B → X u + ν predictions are combined using a so-called 'hybrid' approach, which is a method originally suggested by Ref. [58], and our implementation closely follows Ref. [59] and uses the library of Ref. [60].To this end, we combine both predictions such that the partial branching fractions in the triple differential rate of the inclusive (∆B incl ijk ) and combined exclusive (∆B excl ijk ) predictions reproduce the inclusive values.This is achieved by assigning weights to the inclusive contributions w ijk such that with i, j, k denoting the corresponding bin in the three dimensions of q 2 , E B , and M X : To study the model dependence of the DFN shape function, we also determine weights using the BLNP model of Ref. [61] and treat the difference later as a systematic uncertainty.Table I summarizes the branching fractions for the signal and the important B → X c + ν background processes that were used.Figure 2 shows the generatorlevel distributions and yields of B → X c + ν and B → X u + ν after the tag-side reconstruction (cf.Section III).The B → X u + ν yields were scaled up by a factor of 50 to make them visible.A clear separation can be obtained at low values of M X and high values of E B .The generator-level E B and M X distributions of the CKM suppressed and favored inclusive semileptonic processes, B → X u + ν (scaled up by a factor of 50) and B → X c + ν , respectively, are shown, using the models described in the text.

III. ANALYSIS STRATEGY, HADRONIC TAGGING, AND X RECONSTRUCTION A. Neural Network Based Tag Side Reconstruction
We reconstruct collision events using the hadronic full reconstruction algorithm of Ref. [62].The algorithm reconstructs one of the B mesons produced in the collision event using hadronic decay channels.We label such B mesons in the following as B tag .Instead of attempting to reconstruct as many B meson decay cascades as possible, the algorithm employs a hierarchical reconstruction ansatz in four stages: at the first stage, neural networks are trained to identify charged tracks and neutral energy depositions as detector stable particles (e + , µ + , K + , π + , γ), neutral π 0 candidates, or K 0 S candidates.At the second stage, these candidate particles are combined into heavier meson candidates (J/ψ, D 0 , D + , D s ) and for each target final state a neural network is trained to identify probable candidates.In addition to the classifier output from the first stage, vertex fit probabilities of the candidate combinations, and the full four-momentum of the combination are passed to the input layer.At the third stage, candidates for D * 0 , D * + , and D * s mesons are formed and separate neural networks are trained to identify viable combinations.The input layer aggregates the output classifiers from all previous reconstruction stages.The final stage combines the information from all previous stages to form B tag candidates.The viability of such combinations is again assessed by a neural network that was trained to distinguish correctly reconstructed candidates from wrong combinations and whose output classifier score we denote by O FR .Over 1104 decay cascades are reconstructed in this manner, achieving an efficiency of 0.28% and 0.18% for charged and neutral B meson pairs [63], respectively.Finally, the output of this classifier is used as an input and combined with a range of event shape variables to train a neural network to distinguish reconstructed B meson candidates from continuum processes.The output classifier score of this neural network is denoted as O Cont .Both classifier scores are mapped to a range of [0, 1) signifying the reconstruction quality of poor to excellent candidates.We retain B tag candidates that show at least moderate agreement based on these two outputs and require that O FR > 10 −4 and O Cont > 10 −4 .Despite these relatively low values, knowledge of the charge and momentum of the decay constituents in combination with the known beam-energy allows one to infer the flavor and four-momentum of the B tag candidate.We require the B tag candidates to have at least a beam-constrained mass of with p tag denoting the momentum of the B tag candidate in the center-of-mass frame of the colliding e + e − -pair.
Furthermore, E beam = √ s/2 denotes half the center-ofmass energy of the colliding e + e − -pair.The energy difference is already used in the input layer of the neural network trained in the final stage of the reconstruction.Here E tag denotes the energy of the B tag candidate in the centerof-mass frame of the colliding e + e − -pair.In each event a single B tag candidate is then selected according to the highest O FR score of the hierarchical full reconstruction algorithm.All tracks and clusters not used in the reconstruction of the B tag candidate are used to define the signal side.

B. Signal Side Reconstruction
The signal side of the event is reconstructed by identifying a well-reconstructed lepton with with p e + e − denoting the four-momentum of the colliding electron-positron pair.Leptons from J/ψ and photon conversions in detector material are rejected by combining the lepton candidate with oppositely charged tracks (t) on the signal side and demanding that m t > 0.14 GeV and m et / ∈ [3.05, 3.15] GeV or m µt / ∈ [3.06, 3.12] GeV.If multiple lepton candidates are present on the signal side, the event is discarded as multiple leptons are likely to originate from a double semileptonic b → c → s cascade.For charged B tag candidates, we demand that the charge assignment of the signal-side lepton be opposite that of the B tag charge.The hadronic X system is reconstructed from the remaining unassigned charged particles and neutral energy depositions.Its four momentum is calculated as with E i = |k i | the energy of the neutral energy depositions and all charged particles with momentum p i are assumed to be pions.With the X system reconstructed, we can also reconstruct the missing mass squared, which should peak at zero, M 2 miss ≈ m 2 ν ≈ 0 GeV 2 , for correctly reconstructed semileptonic B → X u + ν and B → X c + ν decays.The hadronic mass of the X system is later used to discriminate B → X u + ν signal decays from B → X c + ν and other remaining backgrounds.It is reconstructed using In addition, we reconstruct the four-momentum-transfer squared, q 2 , as The resolution of both variables for B → X u + ν is shown in Figure 3 as residuals with respect to the generated values of q 2 and M X .The resolution for M X has a root-mean-square (RMS) deviation of 0.47 GeV, but exhibits a large tail towards larger values.The distinct peak at 0 is from B 0 → π − + ν and other low-multiplicity final states comprised of only charged pions.The fourmomentum-transfer squared q 2 exhibits a large resolution, which is caused by a combination of the tag-side B and the X reconstruction.The RMS deviation for q 2 is 1.59 GeV 2 .The core resolution is dominated by the tagging resolution, whereas the large negative tail is dominated from the resolution of the reconstruction of the X system.

C. Background Suppression BDT
At this point in the reconstruction, the B → X c + ν process completely dominates the selected events.To identify B → X u + ν , we combine several distinguishing features into a single discriminant.This is achieved by using a machine learning based classification with boosted decision trees (BDTs).Note that all momenta are in the center-of-mass frame of the colliding e + e − -pair.These features are: 1. M 2 miss : The average B → X c + ν multiplicity is higher than B → X u + ν , broadening the missing mass squared distribution.The key idea of this is that due to the small available phase space from the small mass difference between the D * and D mesons, the flight direction of the slow pion is strongly correlated with the D * momentum direction.The energy and momentum of a D * candidate can thus be approxi-mated as with These three variables are used exclusively for events with charged and neutral slow pion candidates.

Kaons:
We identify the number of K + candidates using the particle-identification likelihood, cf.Section II.In addition, we reconstruct K 0 S candidates from displaced tracks found in the X system.4. B sig vertex fit: The charmed mesons produced in B → X c + ν transitions exhibit a longer lifetime than their charmless counterparts produced in B → X u + ν decays.This can be exploited by carrying out a vertex fit using the lepton and all charged constituents, not identified as kaons, of the X system and we use its χ 2 value as a discriminator.
5. Q tot : The total event charge as calculated from the X system plus lepton on the signal and from the B tag constituents.Due to the larger average multiplicity of B → X c + ν , the expected net zero event charge is more often violated in comparison to B → X u + ν candidate events.
We use the BDT implementation of Ref. [64] and train a classifier O BDT with simulated B → X u + ν and B → X c + ν events, which we discard in the later analysis.Ref. [64] uses optimized boosting and pruning procedures to maximize the classification performance.We choose a selection criteria on O BDT that rejects 98.7% of B → X c + ν and retains 18.5% of B → X u + ν signal.This working point was chosen by maximizing the significance of the most inclusive partial branching fraction, taking into account the full set of systematic uncertainties and the full analysis procedure.The stability of the result as a function of the BDT selection is further discussed in Section VIII.
+ ν background, and all other contributions.To increase visibility, the B → X u + ν component is shown with a scaling factor (red dashed line).The uncertainties on the MC contain the full systematic errors and are further discussed in Section V.  + ν background for the M bc and the BDT selections.Figure 4 shows the output classifier of the background suppression BDT for MC and data.The classifier output shows good agreement between simulated and observed data, with the exception of the first two signal depleted bins.A comparison of the shape of all input variables for B → X u + ν and B → X c + ν , and further MC and data comparisons can be found in Appendix B.

D. Tagging Efficiency Calibration
The reconstruction efficiency of the hadronic full reconstruction algorithm of Ref. [62] differs between simulated samples and the reconstructed data.This difference mainly arises due to imperfections, e.g. in the simulation of detector responses, particle identification efficiencies, or incorrect branching fractions in the reconstructed decay cascades.To address this, the reconstruction efficiency is calibrated using a data-driven approach and we follow closely the procedure outlined in Ref. [34].We reconstruct full reconstruction events by requiring exactly one lepton on the signal side, and apply the same B tag and lepton selection criteria outlined in the previous section.This B → X + ν enriched sample is divided into groups of subsamples according to the B tag decay channel and the multivariate classifier output O FR used in the hierarchical reconstruction.Each of these groups of subsamples is studied individually to derive a calibration factor for the hadronic tagging efficiency: the calibration factor is obtained by comparing the number of inclusive semileptonic B-meson decays, N (B → X + ν ), in data with the expectation from the simulated samples, N MC (B → X + ν ).The semileptonic yield is determined via a binned maximum likelihood fit using the the lepton energy spectrum.To reduce the modeling dependence of the B → X + ν sample this is done in a coarse granularity of five bins.The calibration factor of each these groups of subsamples is given by The free parameters in the fit are the yield of the semileptonic B → X + ν decays, the yield of backgrounds from fake leptons and the yield of backgrounds from true leptons.Approximately 1200 calibration factors are determined this way.The leading uncertainty on the C tag factors is from the assumed B → X + ν composition and the lepton PID performance, cf.Section V. We also apply corrections to the continuum efficiency.These are derived by using the off-resonance sample and comparing the number of reconstructed off-resonance events in data with the simulated on-resonance continuum events, correcting for differences in the selection.

Fit variable Bins
M X [0, 1.5, 1.9, 2.5, 3.1, 4.0] GeV q 2 [0, 2, 4, 6, 8, 10, 12, 14, 26] GeV After the selection, we retain 9875 events.In order to determine the B → X u + ν signal yield and constrain all backgrounds, we perform a binned likelihood fit of these events in several discriminating variables.To reduce the dependence on the precise modeling of the B → X u + ν signal, we use coarse bins over regions that are very sensitive to the admixture of resonant and non-resonant decays, cf.Section II, and explore different variables for the signal extraction.The total likelihood function is constructed as the product of individual Poisson distributions P, with n i denoting the number of observed data events and ν i the total number of expected events in a given bin i.
Here, G k are nuisance-parameter (NP) constraints, whose role is to incorporate systematic uncertainties of a source k into the fit.Their construction is further discussed in Section V.The number of expected events in a given bin, ν i , is estimated using simulated collision events and is given by with η k denoting the total number of events from a given fit component k, and f ik denoting the fraction of such events being reconstructed in bin i as determined by the MC simulation.The three fit components we determine are: a) Signal B → X u + ν events that fall inside a phasespace region for a partial branching fraction we wish to determine.b) Signal B → X u + ν events that fall outside said region if applicable.This component can have very similar shapes as other backgrounds.We thus constrain this component in all fits to its expectation using the world average of B(B → X u + ν ) = (2.13 ± 0.30) × 10 −3 [39].We also investigated different approaches: for instance linking this component with the component of a).This leads to small shifts of O(0.3 − 1%) of the reported partial branching fractions using this component.c) Background events; such are dominated by B → X c + ν and other decays that produce leptons in the final state (e.g. from B → h 1 h 2 and h 2 → h 3 − ν with h 1 , h 2 , and h 3 denoting hadronic final states).Other contributions are from misidentified lepton candidates and a small amount of continuum processes.A full description of all background processes is given in Section III.
We carry out five separate fits to measure three partial branching fractions, using different discriminating variables to determine the B → X u + ν yield.The fits and variables are: 1.The hadronic mass, M X : Signal is expected to predominantly populate the low hadronic mass region, whereas remaining B → X c + ν background will produce a sharp peak at around M X ≈ 2 GeV.The sizeable resolution on the reconstruction of the X system will result in a non-negligible amount of these backgrounds to also be present in the low and high M X region.The determined signal yields are used to measure the partial branching fraction of M X < 1.7 GeV and E B > 1 GeV.We thus use two signal templates and split events according to generator-level M X < 1.7 GeV and M X > 1.7 GeV.
2. The four-momentum-transfer squared, q 2 : Signal will on average have a higher q 2 than B → X c + ν background, whose kinematic endpoint is 2 ≈ 11.6 GeV 2 .However, the reconstructed q 2 of B → X c + ν events is smeared over the entire kinematic range due to the sizeable resolution in the reconstruction of the inclusive X system and the B tag reconstruction.To reduce background from B → X c + ν events, we apply a cut on the reconstructed M X and require a value smaller than 1.7 GeV.The determined signal yields are used to measure the partial branching fraction of M X < 1.7 GeV, q 2 > 8 GeV 2 , and E B > 1 GeV.We use two signal templates: Template a) is defined as signal events with generator-level values of M X < 1.7 GeV and q 2 > 8 GeV 2 and template b) contains all other signal events.

The lepton energy in the B meson rest-
frame, E B : Signal and B → X c + ν can be separated beyond the kinematic endpoint of the B → X c + ν background, which is The lepton energy is reconstructed using its momentum (E B = |p B |), which has excellent resolution.This makes the measurement more sensitive to the exact composition of the B → X c + ν background and B → X u + ν signal.To minimize the dependence on the signal modeling, the endpoint of the lepton spectrum, ranging from E B ∈ [2.5, 2.7] GeV, is treated as a single coarse bin in the fit.To reduce the dependence on the exact modeling of B → X c + ν we require M X < 1.7 GeV.The determined signal yields are used to measure the partial branching fraction with M X < 1.7 GeV and the signal templates are split accordingly into a matching generator-level template and all other signal events.
4. The next fit also analyzes E B , but uses the determined signal yields to measure the partial branch-ing fraction with E B > 1 GeV.Thus no separation of signal events in different categories is used.
5. The final fit uses M X and q 2 simultaneously in a two dimensional fit (M X : q 2 ).This fit also measure the partial branching fraction with E B > 1 GeV and no separation into different categories of signal events is used.
A summary of the binning choices of the kinematic variables is provided in Table III and we further remove events with M X > 4 GeV in all fits.Further, we also exclude events with negative q 2 values in the M X , M X : q 2 , and q 2 fits.The likelihood Eq. 20 is numerically maximized to fit the value of the different components, η k , from the observed events and by using the sequential least squares programming method implementation of Ref. [65].Confidence intervals are constructed using the profile likelihood ratio method.For a given component η k the ratio is where η k , η, θ are the values of the component of interest, the remaining components, and a vector of nuisance parameters (NPs), respectively, that maximize the likelihood function, whereas the remaining components η η k and nuisance parameters θ η k maximize the likelihood for the specific value η k .In the asymptotic limit, the test statistic Eq. 22 can be used to construct approximate confidence intervals through with f χ 2 (x; 1 dof) denoting the χ 2 distribution of the variable x with a single degree of freedom.Further, CL denotes the desired confidence level.The determined signal yields η k = η sig are translated into partial branching fractions via Here tag denotes the tagging efficiency, as determined after applying the calibration factor introduced in Section III D. Further, sel and ∆B(Reg.)denote the signal side selection efficiency and a correction to the efficiency to account for the fraction of B → X u + ν phase-space region that is measured.The factor of 4 in the denominator is due to the factor N BB = (771.58± 9.78) × 10 6 B meson pairs and our averaging over electron and muon final states.
To validate the fit procedure we generated ensembles of pseudoexperiments for different input branching fractions for B → X u + ν signal and B → X c + ν background.Fits to these ensembles show no biases in central values and no under-or overcoverage of CI.Using the current world average of B(B → X u + ν ) = (2.13 ± 0.30) × 10 −3 , we expect approximately between 930 -2070 B → X u + ν signal events with significances s = η sig / ranging from about 9 to 15 standard deviations, depending on the signal region under study, and with being the expected fit error determined from Asimov data sets [66].

V. SYSTEMATIC UNCERTAINTIES
Several systematic uncertainties affect the determination of the reported partial branching fractions.The most important uncertainties arise from the modeling of the B → X u + ν signal component and from the tagging calibration correction.This is followed by uncertainties on particle identification of kaons and leptons, the uncertainty on the number of B-meson pairs, the statistical uncertainty on the used MC samples, and uncertainties related to the efficiency of the track reconstructions.Table IV summarizes the systematic uncertainties for the five measured partial branching fractions probing three phase-space regions.The table separates uncertainties that originate from the background subtraction ( 'Additive uncertainties') and uncertainties related to the translation of the fitted signal yields into partial branching fractions ( 'Multiplicative uncertainties').
The tagging calibration uncertainties are evaluated by producing different sets of calibration factors.These sets take into account the correlation structure from common systematic uncertainties (cf.Section III D) and that individual channels and ranges of the output classifier are statistically independent.When applying the different sets of calibration factors, we notice only negligible shape changes on the signal and background template shapes, but the overall tagging efficiency is affected.The associated uncertainty on the calibration factors is found to be 3.6% and is identical for the five measured partial branching fractions.The B → X u + ν and B → X c + ν modeling uncertainties do directly affect the shapes of M X , q 2 , and E B signal and background distributions.Further, the B → X u + ν modeling affects the overall reconstruction efficiencies and migrations of events inside and outside of the phase-space regions we measure.We evaluate the uncertainties on the composition of the hybrid B → X u + ν MC by variations of the branching fractions and form factors.The uncertainty on non-resonant B → X u + ν contributions in the hybrid model is estimated by changing the underlying model from that of DFN [54] to that of BLNP [17].In addition, the uncertainty on the used DFN parameters m 1S b and a (cf.Section II) are incorporated.For each of these variations, new hybrid weights are calculated to propagate the uncertainties into shapes and efficiencies.We estimate the uncertainties of X u fragmentation into ss quark pairs by variations of the corresponding JETSET parameter γ s (cf.Ref. [57]).As our BDT is trained to reject final states with kaon candidates, a change in this frac-tion will directly impact the signal efficiency.The ss production probability has been measured by Refs.[67,68] at center-of-mass energies of 12 and 36 GeV with values of γ s = 0.35 ± 0.05 and γ s = 0.27 ± 0.06, respectively.We adopt the value and error of γ s = 0.30 ± 0.09, which spans the range of both measurements including their uncertainties.The X u system of the non-resonant signal component is hadronized by JETSET into final states with two or more pions.We test the impact on the signal efficiency by changing the post-fit charged pion multiplicity of non-resonant B → X u + ν to the distribution observed in data in the signal enriched region of M X < 1.7 GeV (cf.Section VIII E and Appendix C).The B → X c + ν background after the BDT selection is dominated by B → D + ν and B → D * + ν decays.We evaluate the uncertainties on the modeling of B → D + ν B → D * + ν and B → D * * + ν by variations of the BGL parameters and heavy quark form factors within their uncertainties.In addition, we propagate the branching fraction uncertainties.The uncertainties on the B → X c + ν gap branching fractions are taken to be large enough to account for the difference between the sum of all exclusive branching fractions measured and the inclusive branching fraction measured.We also evaluate the impact on the efficiency of the lepton-and hadronidentification uncertainties, and the overall tracking efficiency uncertainty.The statistical uncertainty on all generated MC samples is also evaluated and propagated into the systematic errors.
We incorporate the effect of additive systematic uncertainties directly into the likelihood function.This can be done by introducing a vector of NPs, θ k , for each fit template of a process k (e.g.signal or background).Each element of this vector represents one bin of the fitted observables of interest (e.g.M X ,q 2 , E B or a 2D bin of M X : q 2 ).These NPs are constrained parameters in the likelihood Eq. 20 using multivariate Gaussian distributions, G k = G k (0; θ k , Σ k ).Here Σ k denotes the systematic covariance matrix for a given template k and θ k is a vector of NPs.The covariance Σ k is the sum over all possible uncertainty sources for a given template k, with Σ ks denoting the covariance matrix of error source s.
The covariance matrices Σ ks depend on uncertainty vectors σ ks , which represent the absolute error in bins of the fit variable of template k.Uncertainties from the same error source are either fully correlated, or for the case of MC or other statistical uncertainties, are treated as uncorrelated.Both cases can be expressed as Σ ks = σ ks ⊗ σ ks or Σ ks = Diag σ ks 2 , respectively.For particle identification uncertainties, we estimate Σ ks using sets of correction tables, sampled according to their statistical and systematic uncertainties.The systematic NPs are incorporated in Eq.21 by rewriting the fractions f ik for all templates as to take into account changes in the signal or background shape.Here η MC ik denotes the predicted number of MC events of a given bin i and a process k, and θ ik is the associated nuisance parameter constrained by G k .
VI. B → X c ν CONTROL REGION Figure 5 compares the reconstructed M X , q 2 , and E B distributions with the expectation from MC before applying the background suppression BDT.All corrections are applied and the MC uncertainty contains all systematic uncertainties discussed in Section V.The agreement of M X and q 2 is excellent, but some differences in the shape of the lepton momentum spectrum are seen.This is likely due to imperfections of the modeling of the inclusive B → X c + ν background.The discrepancy reduces in the M X < 1.7 GeV region.The main results of this paper will be produced by fitting q 2 and M X in two dimensions.We use the lepton spectrum to measure the same regions of phase space, to validate the obtained results.
Figure 6 shows the reconstructed M X , q 2 , and E B distributions after the BDT selection is applied.The B → X u + ν contribution is now clearly visible at low M X and high E B , while the reconstructed events and the MC expectation show good agreement.The B → X c + ν background is dominated by contributions from B → D + ν and B → D * + ν decays, and the remaining background is predominantly from secondary leptons, and misidentified lepton candidates.The M X and q 2 spectra of the selected candidates prior to applying the background BDT are shown.
(Bottom) The E B spectrum of the selected candidates prior to applying the background BDT are shown for events with M X < 1.7 GeV and M X > 1.7 GeV.
Data/MC FIG. 6.The M X , q 2 and E B spectra after applying the background BDT but before the fit are shown.The B → X u + ν contribution is shown in red and scaled to the world average of B(B → X u + ν ) = (2.13 ± 0.30) × 10 −3 .The data and MC agreement is reasonable in all variables.The E B spectra is shown with selections of M X < 1.7 GeV and M X > 1.7 GeV.The cut of M X < 1.7 GeV is later used in the fit to reduce the dependence on the B → X c + ν modeling of higher charmed states.

Relative uncertainties [%]
Phase-space region  The post-fit distributions of the one-dimensional fits to M X and q 2 are shown, corresponding to the measured partial branching fractions for E B > 1 GeV with additional requirements of M X < 1.7 GeV, and M X < 1.7 GeV and q 2 > 8 GeV 2 , respectively.

VIII. RESULTS
We report partial branching fractions for three phasespace regions from five fits to the reconstructed variables introduced in Section IV.All partial branching fractions correspond to a selection with E B > 1 GeV, also reverting the effect of final state radiation photons, and possible additional phase-space restrictions.The resulting fit yields are listed in Table V.

A. Partial Branching Fraction Results
For the partial branching fraction with M X < 1. with a larger systematic and statistical uncertainty than Eq.27.To further probe the B → X u + ν enriched region, we carry out a measurement for M X < 1.7 GeV and q 2 > 8 GeV 2 from a fit to the q 2 spectrum.This selection only probes about 31% of the available B → X u + ν phase space.We find The corresponding post-fit distribution of q 2 is shown in the bottom panel of Figure 7.The most precise determinations of B → X u + ν are obtained from a two-dimensional fit, exploiting the full combined discriminatory power of M X and q 2 .The resulting partial branching fraction probes about 86% of the available B → X u + ν phase space.We measure The projection of the 2D fit onto M X and the q 2 distribution for the signal enriched region of M X < 1.5 GeV are shown in Figure 9.The remaining q 2 distributions are given in Appendix D. The partial branching fraction is also in good agreement from the measurement obtained by fitting E B , covering the same phase space (c.f. Figure 8): The uncertainties are larger, but both results are compatible.The nuisance parameter pulls of all fits are provided in Appendix D. The result of Eq. 30 can be further compared with the most precise measurement to date of this region of Ref. [69], where ∆B(B → X u ν ) = (1.55 ± 0.12) × 10 −3 , and shows good agreement.The measurement can also be compared to Ref. [15] using a similar experimental approach.
The measured partial branching fraction of E B > 1 GeV is ∆B(B → X u ν ) = (1.82± 0.19) × 10 −3 , which is compatible with Eq. 30 within 0.9 standard deviations.Belle previously reported in Ref. [16] using also a similar approach for the same phase space a higher value of ∆B(B → X u ν ) = (1.96± 0.19) × 10 −3 .We cannot quantify the statistical overlap between both results, but by comparing the number of determined signal events one can estimate it to be below 55%.The dominant systematic uncertainties of Ref. [16] were evaluated using different approaches, but fully correlating the dominant systematic uncertainties and assuming a statistical correlation of 55% we obtain a compatibility of 1.7 standard deviations.The main difference of this analysis with Ref. [16] lies in the modeling of signal and background processes: since its publication our understanding improved and more precise measurements of branching fractions and form factors were made available.Further, for the B → X u + ν signal process in this paper a hybrid approach was adopted (see Section II and Appendix A), whereas Ref. [16] used an alternative approach to model signal as a mix of inclusive and exclusive decay modes.Note that this work supersedes Ref. [16].

B. |V ub | Determination
We determine |V ub | from the measured partial branching fractions using a range of theoretical rate predictions.In principle, the total B → X u + ν decay rate can be calculated using the same approach as B → X c + ν using the heavy quark expansion (HQE) in inverse powers of m b .Unfortunately, the measurement requirements necessary to separate B → X u + ν from the dominant B → X c + ν background spoil the convergence of this approach.In the predictions for the partial rates corresponding to our measurements, perturbative and nonperturbative uncertainties are largely enhanced and as outlined in the introduction the predictions are sensitive to the shape function modeling.
The relationship between measured partial branching fractions, predictions of the rate (omitting CKM factors) ∆Γ(B → X u + ν ), and with τ B = (1.579± 0.004) ps denoting the average of the charged and neutral B meson lifetime [39].We use four predictions for the theoretical partial rates.All predictions use the same input values as Ref. [6] chooses for their world averages.The four predictions are: -BLNP: The prediction of Bosch, Lange, Neubert, and Paz (short BLNP) of Ref. [17] provides a prediction at next-to-leading-order accuracy in terms of the strong coupling constant α s and incorporates all known corrections.Predictions are interpolated between the shape-function dominated region (endpoint of the lepton spectrum, small hadronic mass) to the region of phase space, that can be described via the operator product expansion (OPE).Pull FIG. 9.The post-fit projection of M X of the two-dimensional fit to M X : q 2 on M X and the q 2 distribution in the range of M X ∈ [0, 1.5] GeV are shown.The resulting yields are corrected to correspond to a partial branching fraction with E B > 1 GeV.The remaining q 2 distributions are given in Figure 22 (Appendix D).
-DGE: The Dressed Gluon Approximation (short DGE) from Andersen and Gardi [19,20] makes predictions by avoiding the direct use of shape functions, but produces predictions for hadronic observables using the on-shell b-quark mass.The calculation is carried out in the MS scheme and we use m b (MS) = 4.19 ± 0.04 GeV.
-GGOU: The prediction from Gambino, Giordano, Ossola, and Uraltsev [18] (short GGOU) incorporates all known perturbative and non-perturbative effects up to the order O(α 2 s β 0 ) and O(1/m 3 b ), respectively.The shape function dependence is incorporated by parametrizing its effects in each structure function with a single light-cone function.The calculation is carried out in the kinetic scheme and we use as inputs m kin b = 4.55 ± 0.02 GeV and -ADFR: The calculation of Aglietti, Di Lodovico, Ferrera, and Ricciardi [21,22] makes use of the ratio of B → X u + ν to B → X c + ν rates and soft-gluon resummation at next-to-next-to-leadingorder and an effective QCD coupling approach.The calculation uses the MS scheme and we use m b (MS) = 4.19 ± 0.04 GeV.
Table VI lists the decay rates and their associated uncertainties for the probed regions of phase space, which we use to extract |V ub | from the measured partial branching fractions with Eq. 32.

C. |V ub | Results
From the partial branching fractions with E B > 1 GeV and M X < 1.7 GeV determined from fitting M X we find The uncertainties denote the statistical uncertainty, the systematic uncertainty and the theory error from the partial rate prediction.For the partial branching fraction with E B > 1 GeV, M X < 1.7 GeV, and q 2 > 8 GeV In order to quote a single value for |V ub | we adapt the procedure of Ref. [25] and calculate a simple arithmetic average of the most precise determinations in Eq. 35 to obtain This value is larger, but compatible with the exclusive measurement of 12) × 10 −3 within 1.3 standard deviations.

D. Stability Checks
To check the stability of the result we redetermine the partial branching fractions using two additional working points.We change the BDT selection to increase and decrease the amount of B → X c + ν and other backgrounds, and repeat the full analysis procedure.The resulting values of ∆B(B → X u ν ) are determined using the two-dimensional fit of M X : q 2 and are shown in Bkg.efficiency (%) FIG. 10.The stability of the determined partial branching fraction ∆B(B → X u ν ) using the M X : q 2 fit is studied as a function of the BDT selection requirement.The classifier output selection of 0.83 and 0.87 correspond to signal efficiencies after the pre-selection of 22% and 15%, respectively.These selections increase, or decrease the background from B → X c + ν and other processes by 37% and 33%, respectively.The grey and yellow bands show the total and statistical error, respectively, with the nominal BDT working point of 0.85.+37% and −33%, respectively.The small shifts in central value are well contained within the quoted systematic uncertainties.To further estimate the compatibility of the result we determine the full statistical and systematic correlations of the results and recover that the partial branching fraction with looser and tighter BDT selection are in agreement with the nominal result within 1.1 and 1.4 standard deviations, respectively.
The modeling the B → X u + ν signal composition is crucial to all presented measurements.One aspect difficult to assess is the X u fragmentation simulation: the charmless X u state can decay via many different channels producing a number of charged or neutral pions or kaons.In Section V we discussed how we assess the uncertainty on the number of ss quark pairs produced in the X u fragmentation.Due to the BDT removing such events to suppress the dominant B → X c + ν background, no signalenriched region can be easily obtained.The accuracy of the fragmentation into the number of charged pions can be tested in the signal enriched region of M X < 1.7 GeV. Figure 11 compares the charged pion multiplicity between simulated signal and background processes and data.The signal and background predictions are scaled to their respective normalizations obtained from the twodimensional fit in M X : q 2 .The uncertainty band shown on the MC includes the full systematic uncertainties discussed in Section V.The agreement overall lies within the assigned uncertainties, with the data having more events in the zero multiplicity bin and less in the two charged pion multiplicity bin.We use this distribution to correct our simulation to assign an additional uncertainty from the charged pion fragmentation.More details can be found in Section V and Appendix C.

F. Lepton Flavor Universality and Weak Annihilation Contributions
To test the lepton flavor universality in B → X u + ν we also carry out fits to determine the partial branching fraction for electron and muon final states.For this we categorzie the selected events accordingly and carry out a fit to the M X : q 2 distributions using the same granularity as the fit described in Section VIII A. We carry out a simultaneous analysis of both samples, such that shared NPs for the modeling of the signal or background components can be correctly correlated afterwards.The resulting yields are corrected to a partial branching fraction with E B > 1 GeV and we obtain ∆B(B → X u e + ν e ) = (1.57± 0.10 ± 0.16) ∆B(B → X u µ + ν µ ) = (1.62 ± 0.10 ± 0.18) with a total correlation of ρ = 0.53.The ratio of the electron to the muon final state is with the first error denoting the statistical uncertainty and the second the systematic uncertainty.We observe no significant deviation from lepton flavor universality.More details on the fit can be found in Appendix E. Isospin breaking effects can be studied by separately measuring the partial branching fraction for charged and neutral B meson final states.We determine the ratio by using the information from the composition of the fully reconstructed tag-side B-meson decays to separate charged and neutral B candidates.The partial branching fraction is then determined by a simultaneous fit of both samples in M X : q 2 to correctly correlate common systematic uncertainties.To account for the small contamination of wrongly assigned B tag flavors, we use the wrong-tag fractions from our simulation.
Using this procedure we determine for the individual partial branching fractions with ∆B(B 0 → X u + ν ) = (1.51 ± 0.10 ± 0.16) with a total correlation of ρ = 0.50 and for the ratio Eq. 40 compatible with the expectation of equal semileptonic rates for both isospin states.Isospin breaking effects would for instance arise from weak annihilation contributions, which only can contribute to charged B meson final states.Using Eq. 47 the relative contribution from weak annihilation processes to the total semileptonic B → X u + ν rate can be constrained via Here f u is a factor that corrects the measured partial branching fraction to the full inclusive phase space.We estimate it using the DFN model [54] (cf.Section II for details) and find f u = 0.86.We further assume that f wa = 1, as such processes would produce a high momentum lepton.We recover which translates into a limit of [−0.14, 0.17] at 90% CL.This result is more stringent than the limit of Ref. [15], but weaker than the result of Ref. [70], that directly used the shape of the q 2 distribution to constrain weak annihilation processes.Our result is also weaker than the estimates of Refs.[71][72][73][74] that constrain weak annihilation contributions to be of the order 2-3%.

IX. SUMMARY AND CONCLUSIONS
We report measurements of partial branching fractions with different requirements on the properties of the hadronic system of the B → X u + ν decay and with a lepton energy of E B > 1 GeV in the B rest-frame, covering 31-86% of the available phase space.The sizeable background from semileptonic B → X c + ν decays is suppressed using multivariate methods in the form of a BDT.This approach allows us to reduce such backgrounds to an acceptable level, whilst retaining a high signal efficiency.Signal yields are obtained using a binned likelihood fit in either the reconstructed hadronic mass M X , the four-momentum-transfer squared q 2 , or the lepton energy E B .The most precise result is obtained from a two-dimensional fit of M X and q 2 .Translated to a partial branching fraction for E B > 1 GeV we obtain with the errors denoting statistical and systematic uncertainties.The partial branching fraction is compatible with the value obtained by a fit of the lepton energy spectrum E B and with the most precise determination of Ref. [69].In addition, it is stable under variations of the background suppression BDT.From this partial branching fraction we obtain a value of from an average over four theoretical calculations.This value is higher than, but compatible with, the value of |V ub | from exclusive determinations by 1.3 standard deviations.The compatibility with the value expected from CKM unitarity from a fit of Ref. [75] of |V ub | = 3.62 +0.11 −0.08 × 10 −3 is 1.6 standard deviations.Figure 12 summarizes the situation.The result presented here supersedes Ref. [16]: this paper uses a more efficient tagging algorithm, incorporates improvements of the B → X u + ν signal and B → X c + ν background descriptions, and analyzes the full Belle data set of 711 fb −1 .The measurement of kinematic differential shapes of M X , q 2 , and other properties are left for future work.These results will be crucial for future direct measurements with Belle II that will attempt to use data-driven methods to directly constrain the shape function using B → X u + ν information.The obtained values |V ub | from the four calculations and the arithmetic average is compared to the determination from exclusive B → π + ν and the expectation from CKM unitarity [75] without the direct constraints from semileptonic and leptonic decays.
Foundation.FB is dedicating this paper to his father Urs Bernlochner, who sadly passed away during the writing of this manuscript.We miss you so much.We thank the KEKB group for the excellent operation of the accelerator; the KEK cryogenics group for the efficient operation of the solenoid; and the KEK computer group, and the Pacific Northwest National Laboratory (PNNL) Environmental Molecular Sciences Laboratory (EMSL) computing group for strong computing support; and the National Institute of Informatics, and Science Information NETwork 5 (SINET5) for valuable network support.We acknowledge support from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan, the Japan Society for the Promotion of Science (JSPS), and the Tau- Figure 13 shows the generator level hybrid B → X u + ν signal sample for E B , M X , and q 2 described in Section II.This variable is not used in the signal extraction, but its modeling is tested to make sure that the B → X u + ν fragmentation probabilities cannot bias the final result.The agreement in the signal enriched region with M X < 1.7 GeV after the BDT selection is fair, but shows some deviations.We correct the generator level charged pion multiplicity to match the n π ± observed in this selection by assigning the non-resonant B → X u + ν events a correction weight as a function of the true charged pion multiplicity.After this procedure the agreement is perfect and we use the difference in the reconstruction efficiency as an uncertainty on the pion fragmentation on the partial branching fractions and |V ub | (cf.Section V).
of the partial branching fraction fits, with θ (θ) corresponding to the post-fit (pre-fit) value of the nuisance parameter.Note that uncertainties of each pull shows the post-fit error normalized to the pre-fit constraint Σ k,ii .
Figure 22 shows the post-fit q 2 distributions of the two-dimensional fit to M X : q 2 on M X ., and E B with and without M X < 1.7 GeV events separated out, are shown from left to right.Pull FIG.22.The post-fit q 2 distributions of the two-dimensional fit to M X : q 2 on M X are shown.The panels correspond to: M X ∈ [0, 1.5] GeV (top left), M X ∈ [1.5, 1.9] GeV (top right), M X ∈ [1.9, 2.4] GeV (bottom left) and M X ∈ [2.4,4] GeV (bottom right).The resulting yields are corrected to correspond to a partial branching fraction with E B > 1 GeV.The fitted yields of the two-dimensional fit to M X : q 2 separated in electron and muon candidates, as well as in charged or neutral B mesons are listed in Table VII.

F. BDT EFFICIENCIES
Figure 23 shows the efficiency of the BDT selection as a function of the reconstructed variables q 2 , M X , and the lepton energy E B for simulated B → X u + ν events.Although we avoided using these variables in the boosted decision tree, a residual dependence on the kinematic variables is seen.For instance the efficiency increases with an increase in E B and a decrease with respect to high q 2 .The efficiency on the hadronic mass M X is relatively flat.This efficiency dependence is linked to the used variables in the BDT.Although we carefully avoided kinematic variables that would allow the BDT to learn these kinematic properties, there are indirect connections: e.g.high E B final states have a lower multiplicity as they are dominated by B → π ν decays.Further, their corresponding hadronic system carries little momentum and on average such decays retain a better resolution in discriminating variables of the background suppression BDT.A concrete example is M 2 miss (cf. Figure 15): high multiplicity B → X u + ν decays will retain a larger tail in this variable and will be selected with a lower efficiency by the BDT. 2 ) used in the signal extraction.The bottom right plot shows the efficiencies in the bins of M : q 2 and the binning can be found in the text.The uncertainties are statistical only.

2 with 9
Fermi's constant G F , the CKM matrix element V ub and the projection operatorP L = (1 ≠ " 5 )/2.The decay B ae fi¸‹ is shown at parton level and as an e ective diagram in Figure

Figure 9 . 1 . 2 with
Figure 9.1.:One possible parton level Feynman diagram (a) and the e ective Feynman diagram (b).In the e ective Feynman diagram, the propagator of the W is integrated out, i.e. the weak interaction is point-like, and the gluon interactions are described by the blob.

Figure 9 . 1 .FIG. 1 .
Figure 9.1.:One possible parton level Feynman diagram (a) and the e ective Feynman diagram (b).In the e ective Feynman diagram, the propagator of the W is integrated out, i.e. the weak interaction is point-like, and the gluon interactions are described by the blob.
For the b quark mass in the shape-function scheme we use m SF b = 4.61 GeV and µ 2 SF π = 0.20 GeV 2 .Figures detailing the hybrid model construction can be found in Appendix A.
FIG. 2.The generator-level E B and M X distributions of the CKM suppressed and favored inclusive semileptonic processes, B → X u + ν (scaled up by a factor of 50) and

2 FIG. 3 .
FIG. 3.The resolution of the reconstructed M X and q 2 values for B → X u + ν signal is shown as a residual with respect to the generated values.

2 .
D * veto: We search for low momentum neutral and charged pions in the X system with |p π | < 220 MeV, compatible with a D * → Dπ transition.
with m D * and m D denoting the D * and D meson masses, respectively, and E π = m 2 π + |p π | 2 is the energy of the slow pion.Using the D * candidate four momentum p D * = (E D * , p D * ) we can calculate

FIG. 4 .
FIG. 4. The shape of the background suppression classifierO BDT is shown.MC is divided into B → X u + ν signal, the dominant B → X c+ ν background, and all other contributions.To increase visibility, the B → X u + ν component is shown with a scaling factor (red dashed line).The uncertainties on the MC contain the full systematic errors and are further discussed in Section V.
FIG. 7.The post-fit distributions of the one-dimensional fits to M X and q 2 are shown, corresponding to the measured partial branching fractions for E B > 1 GeV with additional requirements of M X < 1.7 GeV, and M X < 1.7 GeV and q 2 > 8 GeV 2 , respectively.

7 FIG. 8 .
FIG. 8.The post-fit distributions of the fit to E B with M X < 1.7 GeV is shown.The resulting yields were corrected to correspond to the partial branching fraction with E B > 1 GeV with and without an additional requirement of M X < 1.7 GeV, respectively.

Figure 10 .
The background contamination changes by Total unc.for BDT cut = 0.85 Stat.unc.for BDT cut = 0

FIG. 11 .
FIG.11.The post-fit charged pion multiplicity is shown for events with M X < 1.7 GeV.The uncertainties on the MC stack include all systematic uncertainties.

FIG. 13 .
FIG.13.The generator level B → X u + ν distributions E B , M X , and q 2 for neutral (left) and charged (right) B mesons are shown.The black histogram shows the merged hybrid model, composed of resonant and non-resonant contributions.For more details on the used models and how the hybrid B → X u + ν signal sample is constructed, see Section II.
. The most discriminating variables are M 2 miss , the B sig vertex fit probability, and M 2 miss,D * .Figures 15, 16 and  18  show the agreement between recorded and simulated events, taking into account the full uncertainties detailed in Section V.More details about the BDT can be found in Section III C.

FIG. 14 .FIG. 16 .FIG. 17 .FIG. 18 .
FIG. 14.The shape of the input variables for the B → X c + ν background suppression BDT are shown.For details and definitions see Section III C.

31 C.
Figure19compares the charged pion multiplicity at different stages in the selection.This variable is not used in the signal extraction, but its modeling is tested to make sure that the B → X u + ν fragmentation probabilities cannot bias the final result.The agreement in the signal enriched region with M X < 1.7 GeV after the BDT selection is fair, but shows some deviations.We correct the generator level charged pion multiplicity to match the n π ± observed in this selection by assigning the non-resonant B → X u + ν events a correction weight as a function of the true charged pion multiplicity.After this procedure the agreement is perfect and we use the difference in the reconstruction efficiency as an uncertainty on the pion fragmentation on the partial branching fractions and |V ub | (cf.Section V).

FIG. 19 .
FIG.19.The charged pion multiplicity (n π ± ) are compared between data and the simulation: (top left) for all events prior the BDT selection; (top right) for all events after the BDT selection; (bottom left): for the signal enriched region of M X < 1.7 GeV; (bottom right) for the same region but after rescaling the non-resonant contributions such that the n π ± fragmentation probability to match the one observed in data.

Figures 20 and 21
Figures 20 and 21 show the nuisance parameter pulls for each fit category k and bin i defined as

FIG. 20 .
FIG.20.The nuisance parameter pulls on the 1D fits of M X , q2 , and E B with and without M X < 1.7 GeV events separated out, are shown from left to right.

FIG. 21 .
FIG.21.The nuisance parameter pulls on the 2D fit of M X : q 2 is shown.

FIG. 23 .
FIG.23.The B → X u + ν efficiency after the BDT selection is shown as a function of the reconstructed kinematic variables (E B , M X , q 2 ) used in the signal extraction.The bottom right plot shows the efficiencies in the bins of M : q 2 and the binning can be found in the text.The uncertainties are statistical only.

TABLE I .
Branching fractions for B → X u + ν and B → X c + ν background processes that were used are listed.More details on the applied corrections can be found in the text.We neglect the small contribution from B + → D

TABLE II .
The selection efficiencies for B → X u + ν signal, B → X c + ν and for data are listed after the reconstruction of the B tag and lepton candidate.The nominal selection requirement on the BDT classifier O BDT is 0.85.The other two requirements were introduced to test the stability of the result, cf.Section VIII.

TABLE III .
The binning choices of the five fit scenarios are given.

TABLE IV .
The relative uncertainty on the extracted B → X u + ν partial branching fractions are shown.For definitions of additive and multiplicative errors, see text.

TABLE V .
The fitted signal yields in ( η sig ) and outside ( η sig−out ) the measured phase-space regions, the background yields ( η bkg ) and the product of tagging and selection efficiency are listed.The number of analyzed data events, n data , are also listed.

TABLE VI .
The theory rates ∆Γ(B → X u + ν ) from various theory calculations are listed.The rates are given in units of