First observation of $B\!\to \bar{D}_1(\to\bar{D}\pi^+\pi^-)\ell^+\nu_\ell$ and measurement of the $B\!\to \bar{D}^{(*)}\pi\ell^+\nu_\ell$ and $B\!\to \bar{D}^{(*)}\pi^+\pi^-\ell^+\nu_\ell$ branching fractions with hadronic tagging at Belle

We report measurements of the ratios of branching fractions for $B \to \bar{D}^{(*)}\pi\ell^+\nu_\ell$ and $B \to \bar{D}^{(*)}\pi^+\pi^-\ell^+\nu_\ell$ relative to $B \to \bar{D}^*\ell^+\nu_\ell$ decays with $\ell = e, \mu$. These results are obtained from a data sample that contains $772 \times 10^6 B\bar{B}$ pairs collected near the $\Upsilon(4S)$ resonance with the Belle detector at the KEKB asymmetric energy $e^+e^-$ collider. Fully reconstructing both $B$ mesons in the event, we obtain \begin{align*} \frac{B(B^0 \to \bar{D}^0\pi^-\ell^+\nu_\ell)}{B(B^0 \to D^{*-}\ell^+\nu_\ell)}&= (7.24\pm0.36\pm0.12)\%\ ,\\ \frac{B(B^+ \to D^-\pi^+\ell^+\nu_\ell)}{B(B^+ \to \bar{D}^{*0}\ell^+\nu_\ell)}&= (6.78\pm0.24\pm0.15)\%\ ,\\ \frac{B(B^0 \to \bar{D}^{*0}\pi^-\ell^+\nu_\ell)}{B(B^0 \to D^{*-}\ell^+\nu_\ell)}&= (11.10\pm0.48\pm0.20)\%\ ,\\ \frac{B(B^+ \to D^{*-}\pi^+\ell^+\nu_\ell)}{B(B^+ \to \bar{D}^{*0}\ell^+\nu_\ell)}&= (9.50\pm0.33\pm0.27)\%\ ,\\ \frac{B(B^0 \to D^-\pi^+\pi^-\ell^+\nu_\ell)}{B(B^0 \to D^{*-}\ell^+\nu_\ell)}&= (2.91\pm0.37\pm0.25)\%\ ,\\ \frac{B(B^+ \to \bar{D}^0\pi^+\pi^-\ell^+\nu_\ell)}{B(B^+ \to \bar{D}^{*0}\ell^+\nu_\ell)}&= (3.10\pm0.26\pm0.21)\%\ ,\\ \frac{B(B^0 \to D^{*-}\pi^+\pi^-\ell^+\nu_\ell)}{B(B^0 \to D^{*-}\ell^+\nu_\ell)}&= (1.03\pm0.43\pm0.18)\%\ ,\\ \frac{B(B^+ \to \bar{D}^{*0}\pi^+\pi^-\ell^+\nu_\ell)}{B(B^+ \to \bar{D}^{*0}\ell^+\nu_\ell)}&= (1.25\pm0.27\pm0.15)\%\ , \end{align*} where the uncertainties are statistical and systematic, respectively. The invariant mass spectra of the $D\pi$, $D^*\pi$, and $D\pi\pi$ systems are studied. Branching fraction products are extracted, among them the first observations of $B(B^0 \to D_1^-\ell^+\nu_\ell) \times B(D_1^- \to D^-\pi^+\pi^-) = (0.102\pm0.013\pm0.009)\%$ and $B(B^+ \to \bar{D}_1^0\ell^+\nu_\ell) \times B(\bar{D}_1^0 \to \bar{D}^0\pi^+\pi^-) = (0.105\pm0.011\pm0.008)\%$.


Abstract
We report measurements of the ratios of branching fractions for B → D ( * ) π + ν and B → D ( * ) π + π − + ν relative to B → D * + ν decays with = e, µ. These results are obtained from a data sample that contains 772 × 10 6 BB pairs collected near the Υ (4S) resonance with the Belle detector at the KEKB asymmetric energy e + e − collider. Fully reconstructing both B mesons in the event, we obtain

I. INTRODUCTION
Semileptonic decays of B mesons are an important tool for precision measurements of the Cabibbo-Kobayashi-Maskawa matrix elements V cb and V ub [1,2]. The latest determinations of |V cb | from inclusive semileptonic B → X c + ν decays, with X c being a charmed hadronic state that is not explicitly reconstructed, differ from those using the exclusive semileptonic decays B → D + ν and B → D * + ν by about 2.4 σ [3]. The measured sum of the exclusive B → D ( * ) + ν , B → D ( * ) π + ν , and B + → D ( * )− s K + + ν rates accounts for only 85 ± 2% [3] of the inclusive rate for semileptonic B decays to charm final states.
Semileptonic decays of B mesons can also be used for other precision tests of the electroweak sector of the standard model, such as lepton flavor universality. An example is the ratio R(D ( * ) ) of the branching fractions B(B → D ( * ) τ + ν τ ) and B(B → D ( * ) + ν ) ( = e, µ), for which a persistent 3 σ deviation between the standard model expectation [4] and the combined experimental results [5] from BABAR [6,7], Belle [8][9][10], and LHCb [11,12] has been observed. Important backgrounds in these processes are the decays B → D ( * ) π + π − + ν and B → D ( * ) π + ν . The former accounts for part of the missing exclusive rate described above. The latter proceeds predominantly via B → D * * + ν , D * * → D ( * ) π, where the D * * is an orbitally excited (L = 1) charmed meson. The D * * mass-spectrum contains two doublets of states that have light-quark total angular momenta of j q = 1 2 and j q = 3 2 [13]. The spin-0 state D * 0 can only decay to Dπ and the spin-1 states D 1 and D 1 only via D * * → D * π. The spin-2 state D * 2 can decay both into Dπ and D * π. The D * * masses are not far from threshold. Since the j q = 3 2 states (D 1 and D * 2 ) have a significant D-wave component, these states are narrow and were observed with a typical width of about 20 MeV/c 2 [14][15][16]. On the other hand, the states with j q = 1 2 decay mainly via S-wave and are therefore expected to be broad resonances with a width of several hundred MeV/c 2 [13,17]. The decay rate of semileptonic B decays to the j q = 1 2 states is observed to be similar to the rate to the j q = 3 2 doublet, while model calculations predict a substantially smaller rate to the j q = 1 2 doublet [18]. The decay modes with one charged pion in the final state have been measured by BABAR [16] and in a previous Belle analysis [19]. For the B → D ( * ) π + π − + ν channel so far only a BABAR result [20] with limited statistical precision is available. The results of these three measurements are listed in Table I. The current measurement improves upon the TABLE I: Previous results of B → D ( * ) π + ν and B → D ( * ) π + π − + ν branching fraction measurements by BABAR [16,20] and Belle [19]. The first uncertainty is statistical, the second systematic, and the third comes from the branching fraction of the normalization mode.

II. EXPERIMENTAL APPARATUS AND DATA
The Belle detector is a large-solid-angle magnetic spectrometer. Its innermost component is a silicon vertex detector (SVD). A 50-layer central drift chamber (CDC) provides tracking and charged particle identification (PID) information using specific ionization measurements. An array of aerogel threshold Cherenkov counters (ACC), a barrel-like arrangement of timeof-flight scintillation counters (TOF) in the central part, and an electromagnetic calorimeter (ECL) comprised of CsI (Tl) crystals provide further PID information. These detector components are located inside a super-conducting solenoid coil that provides a 1.5 T magnetic field. The iron return yoke located outside of the coil is instrumented to detect K 0 L mesons and to identify muons (KLM). The detector's z-axis is defined to be anti-parallel to the e + beam. More details about the detector can be found in Ref. [22].
Electron candidates are identified using the ratio between the energy deposited in the ECL and their track momentum, the ECL shower shape, the matching between the track and the ECL cluster, the energy loss in the CDC, and the number of photoelectrons in the ACC [23]. Muons are identified based on their penetration range and transverse scattering in the KLM [24]. Charged kaons and pions are identified by a combination of the energy loss in the CDC, the Cherenkov light in the ACC, and the time of flight in the TOF.
A data sample corresponding to an integrated luminosity of L on = 711 fb −1 , collected with the Belle detector at the KEKB asymmetric-energy e + e − collider [25] operating at the Υ (4S) resonance at √ s = 10.58 GeV, is used for the measurement. The sample contains 772 × 10 6 BB pairs. A further data sample corresponding to an integrated luminosity of L off = 89 fb −1 taken slightly below the resonance, at √ s = 10.52 GeV, is used for background templates. These two samples are referred to as the on-resonance and off-resonance samples, respectively.
We use a sample of simulated BB background Monte Carlo (MC) events generated with EvtGen [26]. This sample has six times more events than the Belle collision data. The full detector simulation is based on GEANT3 [27]. Final-state radiation is simulated with the PHOTOS package [28]. The B → D * + ν decays are simulated using the HQET2 model [29] of EvtGen. For the B → D ( * ) π + ν decay modes, dedicated MC samples of 73 × 10 6 events for each of five transitions (via D * 0 and D * 2 for B → Dπ + ν , and via D 1 , D 1 , and D * 2 for B → D * π + ν ) are generated with the ISGW2 model [30]. A signal MC sample of 50 × 10 6 events of B → Dπ + π − + ν is used for simulating the B → D 1 + ν decay modes. An MC sample of 25 × 10 6 events of B → D * π + π − + ν is used for simulating the B → D 1 + ν decay modes. Both are simulated with the ISGW2 model.
Data-MC efficiency differences due to a variety of sources are corrected. A more detailed description can be found in Sec. VI.

III. MEASUREMENT OVERVIEW
The B 0 → D 0 π − + ν branching ratio relative to B 0 → D * − + ν is measured, which reduces systematic uncertainties due to data-MC differences and external branching fraction values of the charm modes as they largely cancel in the ratio. Similar expressions are used for the other measured decays. Here N sig is the number of signal candidates and sig is the corresponding signal efficiency. Branching fractions of B → D ( * ) nπ + ν (n = 1, 2) are also reported, after multiplying by the B(B + → D * 0 + ν ) = 5.58 ± 0.22% and B(B 0 → D * − + ν ) = 4.97 ± 0.12% averages from the Particle Data Group (PDG) [3].

IV. EVENT SELECTION
The Belle data are converted into the Belle II format [31], and the particle and event reconstruction is performed within the basf2 framework [32,33] of the Belle II experiment.

A. Common selection requirements
In each event two B meson candidates are reconstructed. One of the B meson candidates (B tag ) is reconstructed with FEI. The FEI algorithm follows a hierarchical approach. Finalstate particle candidates are combined to intermediate particles until the final B tag candidates are formed. More than 100 explicit decay channels, leading to O(10 000) distinct decay chains are reconstructed. For each final-state particle and for each decay channel of an intermediate particle, a multivariate classifier is trained which estimates the probability that each decay chain correctly describes the true process. In this analysis only hadronically reconstructed decay chains are considered. The B tag meson candidates are required to have a beam-constrained mass M bc = (E c.m. /c 2 ) 2 − P Btag /c 2 > 5.27 GeV/c 2 , and an energy Here E c.m. is half of the center-of-mass (c.m.) energy of the beams, and P Btag and E Btag are the momentum and energy of the B tag meson in the c.m. frame, respectively. The FEI signal probability of B tag candidates is required to be greater than 0.5%. Distributions of M bc , ∆E, and the signal probability are shown in Fig. 1. We take into account that the composition of decay modes reconstructed by FEI differs between data and MC. The ratio between the relative abundance in each decay mode is used to correct this effect. The other B meson candidate (B sig ) is reconstructed in the decays of interest. The first selection step of the B sig reconstruction is the requirement of at least one electron or muon candidate in the event. For both lepton types, the lepton is required to have a minimum momentum of p > 300 MeV/c. The lepton's point of closest approach to the KEKB interaction point (IP) is required to be within |dz| < 2 cm of the IP along the detector axis and within dr < 0.5 cm in the transverse plane.
The polar angle of muon candidate tracks is required to be within the range 45 • < θ µ < 145 • to ensure that the tracks enter the KLM. Electron tracks need to be within the CDC acceptance 17 • < θ e < 150 • . This implies that the track is within the ECL acceptance. The likelihood ratio R µ = L µ /(L µ + L hadron ), where L µ and L hadron are the likelihoods for muons and charged hadrons, is required to be greater than 0.9 for muon candidates. This selection has an average efficiency of 89% with a pion misidentification rate of 1.4% for muons with momenta between 1 and 3 GeV/c [24]. For electron candidates the likelihood ratio R e is required to be greater than 0.8. This requirement has an average efficiency of 92% at a pion misidentification rate of 0.25% for electrons with momenta between 1 and 3 GeV/c [23].
The four-momentum of the closest photon that is within a 5 • cone around an electron's momentum direction is added to that of the electron candidate to correct for bremsstrahlung. The photon's energy is required to be greater than 50, 75 and 100 MeV for the barrel (32.2°< θ γ < 128.7°), forward (12.4°< θ γ < 31.4°) and backward end cap (130.7°< θ γ < 155.1°) region of the ECL, respectively.
Kaons and pions are identified using the ratio R K/π = L K /(L K + L π ) between the combined ACC, TOF, and CDC likelihood for a kaon and the sum of the kaon and pion likelihoods [34]. Kaons (pions) are required to have R K/π > 0.6 (R K/π < 0.4), which has an average efficiency of 92% (93.5%). Kaon and pion candidate tracks must satisfy dr < 2 cm and |dz| < 5 cm.
Neutral kaon candidates are reconstructed from π + π − pairs. The invariant mass of K 0 S candidates is required to be in the range 482 to 514 MeV/c 2 , which is about 4 σ around the nominal mass, where σ corresponds to the mass resolution. For low-(p < 0.5 GeV/c), medium-(0.5 ≤ p ≤ 1.5 GeV/c), and high-momentum (p > 1.5 GeV/c) K 0 S candidates, we require that the pion daughters have dr > 0.05 cm, 0.03 cm, and 0.02 cm, respectively. The angle in the transverse plane between the vector from the interaction point to the K 0 S vertex and the K 0 S flight direction is required to be less than 0.3 rad, 0.1 rad, and 0.03 rad for low-, medium-, and high-momentum candidates, respectively; the separation distance along the beam axis of the two pion trajectories at their point of closest approach is required to be below 0.8 cm, 1.8 cm, and 2.4 cm, respectively. For medium-(high-) momentum K 0 S candidates, we require the flight length in the transverse plane to be greater than 0.08 cm (0.22 cm). Finally, a mass-constrained vertex fit of the K 0 S candidate must converge. Neutral pion candidates are reconstructed from pairs of photons, which must satisfy the same region-dependent energy requirements as the photons considered for the bremsstrahlung correction described above. The diphoton invariant mass is required to be between 120 and 150 MeV/c 2 , which corresponds to about 5 σ around the nominal mass. A mass-constrained fit of the two photons is required to converge. Photons are not allowed to be shared between π 0 candidates. To eliminate duplicates, all π 0 candidates of an event are sorted according to the most energetic daughter photon (and then, if needed, the second most energetic daughter). Any π 0 candidate that shares photons with one that appears earlier in this list is removed.
Charged kaons, charged and neutral pions, and K 0 S mesons are combined to form neutral and charged D meson candidates. A total of 10 hadronic D 0 modes with the final states K − π + , K − π + π 0 , K − π + π + π − , K 0 S π + π − , K + K − , K 0 S π 0 , K 0 S π + π − π 0 , π + π − , K − π + π − π + π 0 , and π + π − π 0 , and 9 hadronic D + modes with the final states K 0 S π + , K 0 S π + π − π + , K − π + π + , K − K + π + , K − π + π + π 0 , K 0 S π + π 0 , K 0 S K + , π + π 0 , and π + π − π + are considered. For D final states with at least one π 0 the D-candidate invariant mass is required to be within ±25 MeV/c 2 of the nominal value [3], while the requirement for all other modes is ±15 MeV/c 2 , which corresponds to about 3 σ. A global decay chain fit [35] is performed for all D modes except for D 0 → K 0 S π 0 . In these fits, mass constraints are applied to the D candidate as well as to K 0 S and π 0 candidates. If the fit fails, the candidate is discarded. Neutral D 0 meson candidates are combined with π 0 candidates to form D * 0 candidates. The mass difference between the D * 0 and the D 0 candidates is restricted to be between 138.9 and 145.5 MeV/c 2 , and a global decay chain fit with mass constraints on the D * 0 , D 0 , K 0 S , and π 0 must converge. Similarly, D * + meson candidates are formed from combinations of D + and π 0 as well as D 0 and π + . The invariant mass of the D * + candidates is allowed to deviate from the nominal mass by no more than 3 MeV/c 2 . Again, a global decay chain fit is performed with mass constraints on the D * + , D 0 or D + , K 0 S , and π 0 .
B. Specific event selection of B → D ( * ) π + ν decays By combining one D ( * ) meson candidate, one lepton candidate, and one charged pion candidate, B meson candidates are formed. The invariant mass M (Dπ) is required to be below 2.8 GeV/c 2 , as the potential D * * states are expected to be at lower masses. We also require M (Dπ) to be above 2.05 GeV/c 2 to suppress B → D * + ν contributions.
C. Specific event selection of B → D ( * ) π + π − + ν decays Further B meson candidates are formed from D ( * ) meson candidates, one lepton candidate, and two oppositely-charged pion candidates. The PID requirement for the muons is tightened to R µ > 0.97, which implicitly also removes all muon candidates with momenta lower than 500 MeV/c. To suppress the background from hadronically decaying B meson events, the missing momentum p miss of the event is required to be greater than 200 MeV/c. Here is the difference between the total momentum of the initial colliding beam particles and the combined momentum of all visible particles measured in the center-of-mass frame. Analogously, the missing energy E miss is defined as the energy difference between the center-of-mass energy and the sum over the energies of the B sig and B tag candidates.
To suppress D * − contributions to the final state in B + → D 0 π + π − + ν , a veto is implemented: the combined invariant mass of the neutral D meson and the pion with the opposite charge to that of the B meson is required to be above 2.05 GeV/c 2 . The contamination from B + → D * − π + + ν with D * − → D 0 π − is reduced by 50% with this veto. However, the pions used in the reconstruction of the B sig meson candidate can also arise from the decay of the B tag meson. Therefore, a second veto is implemented: the invariant mass of each π + used in the B tag reconstruction combined with the signal D 0 is required to be greater than 2.05 GeV/c 2 .
The B → Dπ + π − + ν mode has much more background than the B → D * + ν and B → D ( * ) π + ν modes. In order to increase the sensitivity of this channel, a boosted decision tree (BDT) [36] is used to further reduce the background. The following 25 input variables are used in the BDT: E extra , the unaccounted energy in the ECL; R 1 − R 4 , the ratios of the first, second, third, and fourth to the zeroth Fox-Wolfram moments [37]; H 0 − H 4 , the harmonic moments of zeroth to fourth order with respect to the thrust axis (Chapter 9.3 of Ref. [38]); C 0 − C 8 , the momentum flow in nine cones of 10 • around the thrust axis [39]; the sphericity and the aplanarity of the event(Chapter 9.3 of Ref. [38]); the thrust value of the event and the cosine of the polar angle of the thrust axis(Chapter 9.3 of Ref. [38]); the number of tracks used in the B tag reconstruction; the number of neutral clusters used in the B tag reconstruction. The BDT is trained with signal MC simulations and off-resonance data, as most of the remaining background originates from e + e − → qq (q = u, d, s, c) "continuum" events. The signal MC is divided into N subsamples each containing the number of expected candidates in the full Belle dataset based on the branching fraction results of the BABAR measurement [20]. For each subsample an individual BDT is trained using the other N − 1 subsamples such that the size of the training sample is maximized while keeping it independent from the sample that the BDT is applied to and therefore avoiding bias. Separate BDTs are trained for the B + and B 0 modes. The distribution of all BDT output classifiers combined is shown in Fig. 2. The BDT output variable is required to be greater Candidates / 0.0080 Candidates to the right of the vertical line are retained.

D. Υ (4S) selection
A total of 12 B + modes and 12 B 0 modes are reconstructed. Each B sig candidate and B tag candidate are combined to form an Υ (4S) candidate. In the combinations the electric charge must be conserved but the flavor of two neutral B mesons is allowed to be the same. Candidates with tracks that are not assigned to the Υ (4S) candidate are rejected. In events that contain more than one Υ (4S) candidate, a single candidate is selected, as follows.
Firstly, the B tag candidate with the highest FEI signal probability is selected. If multiple B sig candidates remain, the D * mode is preferred over the D mode, since otherwise an additional π 0 candidate would be left in the event. In some events, candidates are reconstructed in both the one-pion and the two-pion modes. As the distribution of U = E miss − p miss c is used for signal extraction, if U is between −0.1 and 0.1 GeV for at least one candidate in both decay modes, all candidates in the event are rejected. If there are still multiple candidates, only the one with the smallest difference between the D ( * ) candidate mass and the nominal mass is retained.
For each signal MC mode, the efficiency is taken to be the fraction of correctly reconstructed candidates. A weighted average of the efficiencies based on the relative abundances of the D * * state reported by the PDG [3] is taken as the final efficiency value. The ratios between the efficiencies of the signal and normalization modes are given in Table II.

V. EXTRACTION OF SIGNAL YIELDS
The number of signal candidates is determined with an unbinned extended maximum likelihood fit of U = E miss − p miss c. The probability density function (PDF) used to describe the U distribution is constructed from templates based on the MC.
For the fit of the B → D ( * ) + ν sample the total PDF consists of four (three) components where P D ( * ) sig is the signal PDF and P D fd , P D ( * )

BB
, and P D ( * ) cont are the PDFs describing the feeddown, BB background, and continuum background, respectively. Feeddown describes a contribution from B → D * + ν that shows up in the B → D + ν modes if the neutral pion of a D * 0 → D 0 π 0 or a D * + → D + π 0 decay is missed in the reconstruction. Due to the missing π 0 it is shifted to higher values in the U distribution. Thus, this contribution can be separated and used to improve the sensitivity of the branching fraction measurement.
The fraction of the BB component among the total background, f bkg , is constrained to the values estimated in simulation. A simultaneous fit of B → D + ν and B → D * + ν is performed, where the total B → D * + ν yield N D * sig is determined as the sum of the signal and feeddown components, which are related via their efficiencies MC . The templates used to construct the PDFs are created with 125 bins between −0.5 and 2 GeV. Separate PDFs are used for the electron and muon modes except for the continuum PDF, which is created from the combined sample of the two modes as their distributions are statistically compatible with each other.
The width of the signal peak in the U distribution differs between data and MC, even after all known corrections are applied. To compensate for this effect, the signal PDFs are constructed by convolving the signal-MC templates with a Gaussian whose mean and width are floating in the fit to data. Independent widths are used for the electron and muon modes. The fitted B → D ( * ) + ν signal and background yields are listed in Table III and  B. Fit of B → D ( * ) π + ν and B → D ( * ) π + π − + ν samples A simultaneous fit to the U distribution of 16 categories splitting the full sample according to the B flavor mode (B 0 vs B + ), the D mode (D 0 /D + vs D * 0 /D * + ), the number of pion daughters (Dπ vs Dππ), and the lepton mode (e vs µ) is performed. This allows several background sources to be constrained directly from the data, as described below. All templates are constructed with 120 bins in the range −1 to 2 GeV.
The B + → D − π + + ν fit PDF consists of five components: signal, feeddown, misrecon- Candidates / ( 0.02 GeV ) for the data. The MC shapes, normalized according to the result of the fit, are also shown. structed B → D * * + ν background, other BB, and continuum: The signal template P D − π + + sig is obtained from signal MC, in which the Dπ is produced in is obtained from 14 different MC samples: • B + → D 0 π + π − + ν , These events constitute background due to misreconstructed signal candidates, swapping of final state particles between the B sig and B tag candidates, or events with D * * → D ( * ) π 0 . The composition of the different D * * states is set to the world averages of these modes [3]. The yields of the B → D * * + ν background components, N D * * , i , are calculated as the product of the terms listed in Table IV. The other BB background is taken from a generic b → c MC sample with six times the luminosity of data. Off-resonance data is used to model the continuum PDF P D − π + + off . The yield of the continuum contribution in the fit is constrained via the ratio of the on-and off-resonance luminosities. The ratio is allowed to float in the fit within a Gaussian constraint with a width of 1%. This accounts for the uncertainty in the determination of the luminosity ratio.
The fit model for B 0 → D 0 π − + ν is constructed similarly: The signal composition is 71% B 0 → D * − the B 0 → D * − + ν signal yield N D * − + sig from the fit described in Sec. V A and the ratio of efficiencies of the B 0 → D 0 π − + ν and B 0 → D * − + ν selections.
The B → D * π + ν fit models consist of only four components as there is no feeddown. The strategy for modelling background from B → D * * + ν is the same as for B → Dπ + ν . The signal PDF template is obtained from signal MC, in which the D * π final state is produced in D 1 decay, D 1 decay, and D * 2 decay at the same proportions as the feeddown components in B → Dπ + ν described above.
For B → Dπ + π − + ν the fit model contains four components (signal, feeddown, other BB, continuum), while for B → D * π + π − + ν only three components are needed as there is no feeddown. Following the findings of the BABAR measurement [20] the signal is assumed to proceed via a D 1 resonance for the Dππ modes and via a D 1 resonance for the D * ππ modes. The B → D * π + π − + ν templates are constructed with 30 bins in the range −0.5 to 1 GeV.
The plots of the data and fit results are shown in Figs. 7 to 14. The signal and background yields are summarized in Table V.         Signal Background electron mode muon mode electron mode muon mode

VI. SYSTEMATIC UNCERTAINTIES
The systematic uncertainties mainly arise from the fit modeling, the uncertainty on the branching fraction values of the normalization mode B → D * + ν and the charm modes, and the hadron PID. For the two-pion modes there are additional sizable systematic uncertainties from the BDT and from the limited size of the MC sample used to calculate the signal efficiency of the selection. The various considered sources of systematic uncertainties are described below. Their numerical values are summarized in Tables VI and VII. a. MC statistics fit model: To account for the finite size of the MC samples used to produce the PDF templates, alternative fit PDFs are created by varying the bin contents of each PDF template according to a Poisson distribution. This is done 1000 times, and after each variation the fit to the collision data is performed with the new set of templates. It is checked that the pull distributions are unbiased, where the pull is defined as the difference between the yields using the varied fit PDF and the nominal yields divided by the statistical uncertainty of the new yields. The spread of the new signal yields (about 1% for the one-pion modes, 5 to 20% for the two-pion modes) is used as an estimate for the systematic uncertainty.
b. MC statistics signal efficiency: The uncertainty on the calculated signal efficiency ratios in Table II due to the finite size of the MC samples is propagated to the branching fractions and ratios, and assigned as systematic uncertainty.
c. Charm branching ratios: To estimate the uncertainty due to the uncertainties on the branching ratios of the charm decays, we sample each charm branching ratio 10 000 times from a Gaussian distribution with mean and width that equal to the PDG central value and uncertainty [3]. It is assumed that the branching fractions for different D modes are independent. For each sampled set of D branching fractions, the new sum of branching fractions is calculated for the signal and normalization channels. The reconstruction efficiency is taken into account via the relative abundance of the modes. The ratio of the sums is calculated and the spread of the resulting distribution assigned as systematic uncertainty.
d. Signal B → D * * + ν composition: The signal PDF U shapes slightly vary for different intermediate D * * states. Therefore, the overall U shape depends on the D * * composition. To estimate the signal branching-fraction uncertainty due to the uncertainties in the D * * composition, we generate the U distribution using the template of one D * * state and then fit with the nominal signal template described in Sec. V B whose composition is taken from Ref. [3]. The largest average difference between the generated and fitted signal yields among the tested D * * scenarios, which varies between 0.4% and 0.8%, depending on the mode, is assigned as a systematic uncertainty.
e. Lepton PID: By using γγ → + − processes, lepton PID efficiency factors in kinematic ranges of the momentum and polar angle have been calculated (Chapter 5.4 of Ref. [38]), which correct for the difference between the selection efficiency in data and MC. The systematic uncertainties on the PID efficiency factors account for the method itself and for a possible effect from a hadronic environment, which is determined using inclusive B → J/ψ X decays. To propagate the uncertainties to the branching fractions we sample lepton correction factors for each kinematic bin using a Gaussian around the nominal value with a width corresponding to the uncertainty of the correction factor. The average correction factor over all truth-matched signal events as well as the average correction factor over all truth-matched candidates of the normalization channels are calculated. The spread of the distribution of the ratio of the two means is taken as the systematic uncertainty due to lepton identification. This procedure is performed separately for each of the D * * states, and the largest uncertainty per B and D ( * ) mode among all B → D * * + ν modes is assigned as the systematic uncertainty.
f. Charged hadron PID: Similar to the study for the lepton PID, correction factors for the hadron PID selection requirements are sampled in bins of the momentum and polar angle to evaluate the systematic uncertainty on the branching fraction due to the uncertainties in the determination of the correction factors using inclusive D * samples (Chapter 5.4 of Ref. [38]). The average correction factors of the signal and normalization samples are calculated, then divided by each other, and the spread of the resulting distribution of ratios is interpreted as the systematic uncertainty for the hadron PID. Similar to the lepton PID described above, the largest value over the possible D * * states is assigned as the final systematic uncertainty.
g. Tracking efficiency: For each signal and normalization mode the average track multiplicity over the various D modes is determined in simulation. The difference between the signal and normalization mode average track multiplicity is multiplied by 0.35% (Chapter 15.1.1.2 of Ref. [38]) and the result is taken as systematic uncertainty due to tracking efficiency differences between data and MC. For low-momentum tracks (p T < 200 MeV/c) an additional tracking-related systematic uncertainty is calculated. Using a B 0 → D * − π + sample the slow pion efficiency is determined in six momentum bins for data and MC (Chapter 15.1.1.2 of Ref. [38]). The relative uncertainty of the ratio between the data and MC efficiencies is taken as systematic uncertainty due to low-momentum tracking. The two tracking-related systematic uncertainties are added in quadrature.
h. π 0 efficiency: The π 0 efficiency differs between data and MC. The effect is corrected in the calculation of the signal efficiency and the uncertainty on the ratio between the data and MC efficiency of about 2.4% (Chapter 15.1.4 of Ref. [38]) is propagated to the systematic uncertainty of the branching fraction measurement. First, the average π 0 multiplicity for each signal and normalization mode is determined and the difference between the signal and normalization values is calculated. This difference is multiplied by the aforementioned uncertainty to obtain the systematic uncertainty due to the π 0 efficiency data-MC ratio.
i. B → D * + ν and B → D * * + ν form factors: The B → D * * + ν MC samples are generated with the ISGW2 model [30]. A more accurate description can be achieved with the LLSW model [17]. To estimate the systematic uncertainty due to using the ISGW2 model two-dimensional form factor weights in ω = m 2 B +m 2 D * * −q 2 2m B m D * * , with the masses of the B meson m B and the D * * system m D * * , and the four-momentum transfer squared to the lepton-neutrino system q 2 , and the cosine of the angle between the charged lepton and the D meson cos θ l are determined. These weights are calculated separately for decays via D * 0 , D 1 , D 1 , and D * 2 mesons. The U distribution is generated using the nominal ISGW2-based templates and fit with signal and feeddown templates that are reweighted with the form factor weights described above. The average difference between the fitted and generated yields over 1000 iterations of generating and fitting is calculated and divided by the generated yield (f sig ).
Similarly, the simulation of the B → D ( * ) + ν modes is based on heavy quark effective theory (HQET) [29]. A reweighting in the momentum transfer and the momentum of the charged lepton is applied to account for outdated values of the CLN [40] form factor parameters ρ 2 , R 1 , and R 2 . The U distribution is generated with the nominal HQET-based templates and fit with the reweighted templates. The difference between the fitted and generated yields divided by the generated yield is calculated (f norm ).
The difference of the ratio f sig / f norm from unity is taken as the systematic uncertainty due to the form factors.
k. BDT: The BDT to suppress continuum background in B → D ( * ) π + π − + ν is trained with signal MC and off-resonance data. Differences in the input variable distributions between the signal simulation and signal events in real data might introduce a bias in the calculation of the signal efficiency. To estimate the associated uncertainty, the BDT output is calculated for the cross-check and normalization modes B → D + ν and B → D * + ν . The same requirement on the BDT output as for the Dππ signal-candidate selection is applied for these B → D ( * ) + ν modes and the fit to the B → D ( * ) + ν sample described in Sec. V A is performed. The ratio between the B → D ( * ) + ν yield of this fit and the yield obtained without the BDT requirement is considered a data-based efficiency of the BDT requirement. This efficiency is compared with the signal MC efficiency of the B → D ( * ) + ν samples. The largest relative difference between the data-and MC-based efficiencies among the B → D + ν and B → D * + ν values is taken as the BDT-related systematic uncertainty. This procedure assumes that the BDT, which uses variables of the B tag meson reconstruction and event-shape variables, is mostly independent of the B sig meson reconstruction.

VII. BRANCHING FRACTION RESULTS
The weighted average branching fraction ratios are calculated based on the total uncertainties. The calculation takes into account that some component uncertainties are correlated between the electron and muon mode. The results and the ratios between the electron and muon mode branching fractions are listed in Table VIII. The results are the most precise determinations of these branching fraction ratios to date (except for B 0 → D * − π + π − + ν ). All values are compatible with the previous world averages. The electron and muon values are compatible with each other within one standard deviation apart from those for B 0 → D * − π + π − + ν . The p-value of the hypothesis that the latter are The branching fraction ratios are converted into absolute branching fractions by multiplying them with the branching fraction of B → D * + ν . The results are listed in Table IX. VIII. EXCLUSIVE B → D * * + ν BRANCHING FRACTIONS Using the sPlot technique [41] with the implementation of Ref. [42], signal weights are assigned to each event based on the fit to the U distribution. This allows the background VIII: Branching fraction ratio results and ratios between electron and muon decay modes with statistical and systematic uncertainties. The denominator for the branching fraction ratios is B 0 → D * − + ν for the B 0 modes and B + → D * 0 + ν for the B + modes.

Decay mode
Branching fraction ratio [%] e/µ ratio B 0 → D 0 π − + ν 7.24 ± 0.36 (stat) ± 0.12 (syst) 1.13 ± 0.11 (stat) B + → D − π + + ν 6.78 ± 0.24 (stat) ± 0.15 (syst) 1.07 ± 0.08 (stat) B 0 → D * 0 π − + ν 11.10 ± 0.48 (stat) ± 0.20 (syst) 0.98 ± 0.08 (stat) B + → D * − π + + ν 9.50 ± 0.33 (stat) ± 0.27 (syst) 1.06 ± 0.08 (stat) 0.145 ± 0.018 (stat) ± 0.013 (syst) B + → D 0 π + π − + ν 0.173 ± 0.014 (stat) ± 0.013 (syst) B 0 → D * − π + π − + ν 0.051 ± 0.021 (stat) ± 0.009 (syst) B + → D * 0 π + π − + ν 0.070 ± 0.015 (stat) ± 0.008 (syst) contribution to the m(Dπ), m(D * π), and m(Dππ) distributions to be statistically subtracted, and the signal-only distribution to be studied. We perform weighted unbinned maximum likelihood fits to the invariant mass distributions. The uncertainty calculation is based on Ref. [43]. For the B → Dπ + ν modes the PDG reports decays via the D * 0 and D * 2 resonances. These two contributions are parametrized with Breit-Wigner functions that are convolved with a Gaussian distribution. The width of the Gaussian is fixed from simulations to 3.4 MeV/c 2 . The peak position and width of the D * 0 and D * 2 resonances are allowed to float in the fit. However, they are constrained within Gaussian distributions using their world averages and corresponding uncertainties [3]. In a second fit the peak positions and widths are fixed to the results from the first fit. The difference in the statistical uncertainties between the two fits is used to single out the uncertainty introduced by the Gaussian constraint. It is interpreted as a systematic uncertainty. The weighted m(Dπ) distribution (see Fig. 15) shows that a third component must be added to the fit model. Here, we choose an exponential distribution. The yields, which are listed in Table X, are converted into branching fractions using Eq. (1). The statistical uncertainty is extracted directly from the fit, while the systematic uncertainty is the sum in quadrature of the relative uncertainties of the inclusive branching fractions reported in Table IX and the uncertainties introduced by the limited knowledge of the D * * peak positions and width described above. In the fit to the m(D 0 π − ) distribution the yield of the D * − 0 component is compatible with zero. Therefore, instead of calculating a branching fraction, an upper limit at 90% confidence level (CL) is set. We create 2000 new data samples by bootstrapping [44] the original data (randomly selecting events, each with its corresponding weight, while allowing repetition of the events). The D 0 π − mass fit is performed for each sample. The 90% CL upper limit on the yield is the value that is higher than that found in 90% of the samples in which a positive D * − 0 yield is obtained. This yield is then converted into the upper limit. The results for the decays via the D * 2 resonance are 0.054 ± 0.022 (stat) ± 0.005 (syst) B + → D * 0 2 + ν with D * 0 2 → D − π + 590 ± 39 24.9 0.163 ± 0.011 (stat) ± 0.007 (syst) other B + → D − π + + ν 520 ± 70 -compatible with the world averages. They constitute the most precise measurements of these branching fractions to date. On the other hand, the value for B (B + → D * 0 0 + ν ) × B (D * 0 0 → D − π + ) is significantly smaller than previous measurements. This applies even more so to the B 0 mode, where no contribution could be found in this analysis. Three D * * resonances are known for the D * π final state, D 1 , D 1 , and D * 2 . The three components are parametrized with Breit-Wigner functions convolved with a Gaussian. The shape parameters of the two narrow resonances D 1 and D * 2 are constrained within Gaussian distributions to their world averages [3], while the peak position and width of the broad D 1 resonance is fixed to its world average. Instead of fitting the m(D * π) mass directly the invariant mass of the D * is subtracted. This allows to conveniently incorporate the feeddown component as well. By subtracting the invariant mass of the D meson from m(Dπ) the peaks align. We perform the fit in the range 0.2 to 0.8 GeV/c 2 . The data and the overlaid fit projections are shown in Fig. 16. The yields of the three components and the resulting branching fractions are listed in Table XI. The systematic uncertainty is dominated by the shape uncertainties. It is determined by fitting twice, once with the shape parameters floating and once fixed. The results for the decays via the narrower D 1 and D * 2 resonances  are compatible with previous measurements and the world averages. For the decay via the wider D 1 resonance the branching fractions are measured 35% (50%) lower than the world average in the B 0 (B + ) mode. The weighted unbinned maximum likelihood fit to the m(Dππ) distribution is performed in the range 2.15 to 5 GeV/c 2 (see Fig. 17). Initially, the fit model consists of a single Gaussian and a first-order polynomial. The fitted peak position and width are compatible with the D 1 resonance for the B 0 and B + modes. Therefore, the Gaussian component is interpreted as B 0 → D − 1 + ν with D − 1 → D − π + π − and B + → D 0 1 + ν with D 0 1 → D 0 π + π − , respectively. The peaking component is replaced with a Breit-Wigner function convolved with a Gaussian. The shape parameters of the Breit-Wigner are set to the PDG values, but allowed to float within a Gaussian constraint. We find 103 ± 13 events for the B 0 mode and 197 ± 20 events for the B + mode. By comparing the log-likelihood with a fit, in which the D 1 yield is fixed to zero, the statistical significance is determined to be 17.3 for the B 0 mode and 25.1 for the B + mode. The remaining signal events (42 ± 13 events in the B 0 mode and 131 ± 20 events in the B + mode), which are parametrized with the polynomial, can either be a non-resonant decay process or a decay via a very broad resonance, such as the D * 0 or D 1 . However, with our statistical power we can only state that there must be at least one additional process besides the decay via the D 1 resonance, but cannot characterize it further. The D 1 yields are converted into the following branching fractions: 1 → D − π + π − ) = (0.102 ± 0.013 (stat) ± 0.009 (syst))% (9) B(B + → D 0 1 + ν ) × B(D 0 1 → D 0 π + π − ) = (0.105 ± 0.011 (stat) ± 0.008 (syst))% This is the first observation of these decay modes.

IX. CONCLUSION
In conclusion, using hadronic tagging, we have measured the B → D ( * ) π + ν and B → D ( * ) π + π − + ν branching fractions, achieving the highest precision to date (except for B 0 → D * − π + π − + ν ). These results were obtained from a data sample that contains 772 × 10 6 BB pairs collected near the Υ (4S) resonance with the Belle detector at the KEKB asymmetric energy e + e − collider. All values are compatible with the previous world averages. Furthermore, the mass spectra of the hadronic final state particles were studied after statistically subtracting the background contributions. We have extracted several exclusive B → D * * + ν branching fractions including the first observations of B → D 1 + ν with D 1 → Dπ + π − . X. ACKNOWLEDGMENTS