Measurement of $\mathcal{R}(D)$ and $\mathcal{R}(D^*)$ with a semileptonic tagging method

The experimental results on the ratios of branching fractions $\mathcal{R}(D) = {\cal B}(\bar{B} \to D \tau^- \bar{\nu}_{\tau})/{\cal B}(\bar{B} \to D \ell^- \bar{\nu}_{\ell})$ and $\mathcal{R}(D^*) = {\cal B}(\bar{B} \to D^* \tau^- \bar{\nu}_{\tau})/{\cal B}(\bar{B} \to D^* \ell^- \bar{\nu}_{\ell})$, where $\ell$ denotes an electron or a muon, show a long-standing discrepancy with the Standard Model predictions, and might hint to a violation of lepton flavor universality. We report a new simultaneous measurement of $\mathcal{R}(D)$ and $\mathcal{R}(D^*)$, based on a data sample containing $772 \times 10^6$ $B\bar{B}$ events recorded at the $\Upsilon(4S)$ resonance with the Belle detector at the KEKB $e^+ e^-$ collider. In this analysis the tag-side $B$ meson is reconstructed in a semileptonic decay mode and the signal-side $\tau$ is reconstructed in a purely leptonic decay. The measured values are $\mathcal{R}(D)= 0.307 \pm 0.037 \pm 0.016$ and $\mathcal{R}(D^*) = 0.283 \pm 0.018 \pm 0.014$, where the first uncertainties are statistical and the second are systematic. These results are in agreement with the Standard Model predictions within $0.2$, $1.1$ and $0.8$ standard deviations for $\mathcal{R}(D)$, $\mathcal{R}(D^*)$ and their combination, respectively. This work constitutes the most precise measurements of $\mathcal{R}(D)$ and $\mathcal{R}(D^*)$ performed to date as well as the first result for $\mathcal{R}(D)$ based on a semileptonic tagging method.

Semitauonic B meson decays, involving the transition b → cτ ν τ , are sensitive probes for physics beyond the Standard Model (SM). Any difference in the branching fraction of these processes with respect to the SM prediction would violate lepton flavor universality, which enforces equal coupling of the gauge bosons to the three lepton generations. Indeed, in many models beyond the SM, new interactions with enhanced coupling to the third family are postulated. Among such new mediators, charged Higgs bosons, which appear in supersymmetry [1] and other models with two Higgs doublets [2], may contribute measurably to the b → cτ ν τ decay rate due to the large masses of the τ and the b quark. Similarly, leptoquarks [3], which carry both lepton and baryon numbers, may also contribute to this process.
The ratios of branching fractions, where the denominator represents the average of electron and muon modes, are typically measured instead of the absolute branching fractions ofB → D ( * ) τ −ν τ to reduce common systematic uncertainties, such as those due to the detection efficiency, the magnitude of the quark-mixing matrix element |V cb |, and the semileptonic decay form factors. Hereafter,B → D ( * ) τ −ν τ [4] and B → D ( * ) −ν will be referred to as the signal and normalization modes, respectively. The SM calculations for these ratios, performed by several groups [5][6][7][8], are averaged by HFLAV [9] to obtain R(D) = 0.299 ± 0.003 and R(D * ) = 0.258 ± 0.005.
Semitauonic B decays were first observed by Belle in 2007 [10], with subsequent studies reported by Belle [11][12][13][14], BaBar [15], and LHCb [16,17]. The average values of the experimental results, excluding the result presented in this Letter, are R(D) = 0.407 ± 0.039 ± 0.024 and R(D * ) = 0.306 ± 0.013 ± 0.007 [9], where the first uncertainty is statistical and the second is systematic. These values exceed SM predictions by 2.1σ and 3.0σ, respectively, where σ denotes the standard deviation. A combined analysis of R(D) and R(D * ) taking correlations into account finds that the deviation from the SM prediction is approximately 3.8σ [9]. This large discrepancy must be investigated with complementary and more precise measurements.
Measurements at the e + e − "B-factory" experiments Belle and BaBar, are commonly performed by first reconstructing one of the B mesons in the Υ (4S ) → BB decay, denoted as B tag , using a dedicated tagging algorithm. So far, simultaneous measurements of R(D) and R(D * ) at Belle and BaBar have been performed using hadronic tagging methods on both B 0 and B + decays [12,15], while only R(D * + ) was measured with a semileptonic tagging method [13]. In this Letter, we report the first measurement of R(D) using the semileptonic tagging method, and we update or measurement of R(D * ) by combining results of B 0 and B + decays with a more efficient tagging algorithm. Our previous measurement of R(D * + ) with a semileptonic tagging method is therefore superseded by this work.
We use the full Υ (4S ) data sample containing 772×10 6 BB events recorded with the Belle detector [18] at the KEKB e + e − collider [19]. Belle was a general-purpose magnetic spectrometer, which consisted of a silicon vertex detector (SVD), a 50-layer central drift chamber (CDC), an array of aerogel threshold Cherenkov counters (ACC), time-of-flight scintillation counters (TOF), and an electromagnetic calorimeter (ECL) comprising CsI(Tl) crystals. These components were located inside a superconducting solenoid coil that provided a 1.5 T magnetic field. An iron flux-return yoke located outside the coil was instrumented to detect K 0 L mesons and muons (KLM). The detector is described in detail elsewhere [18].
To determine the reconstruction efficiency and probability density functions (PDFs) for signal, normalization, and background modes, we use Monte Carlo (MC) simulated events generated with the EvtGen event generator [20]. The detector response is simulated with the GEANT3 package [21].
Semileptonic B → D ( * ) ν decays are generated with the HQET2 EvtGen package, based on the CLN parametrization [22]. As the measured parameters of the model have been updated since our MC sample was generated, we apply an event-by-event correction factor obtained by taking the ratio of differential decay rates in the updated CLN parameters compared to those used in the MC. For the MC samples of B → D * * ν decays, we used the ISGW2 EvtGen package, based on the quark model described in Ref. [23]. This model has been superseded by the LLSW model [24]; thus we weight events with a correction factor based on the ratio of the analytic predictions of LLSW and MC distributions generated with ISGW2. Here, D * * denotes the orbitally excited states D 1 , D * 2 , D 1 , and D * 0 . We consider D * * decays to a D ( * ) and a pion, a ρ or an η meson, or a pair of pions, where branching fractions are based on quantum number, phase-space, and isospin arguments. The sizes of the inclusive Υ (4S ) → BB MC sample and the dedicated B → D * * ν MC sample correspond to about 10 times and 5 times the integrated luminosity of the Υ (4S ) data sample, respectively.
The B tag is reconstructed using a hierarchical algorithm based on boosted decision trees (BDT) [25] in D ν and D * ν channels, where = e, µ. The BDT classifier assigns to each B tag candidate a probability of representing a well-reconstructed B meson. The range of the BDT classifier extends from 0 to 1, with well-reconstructed candidates having the highest values. We select B tag candidates with a BDT classifier output greater than 10 −1.5 . We suppress B → D * τ (→ νν)ν events on the B tag side by applying a selection on cos θ B,D ( * ) . This variable corresponds to the cosine of the angle between the momenta of the B meson and the D ( * ) system in the Υ (4S ) rest frame, under the assumption that only one massless particle is not reconstructed: Here E beam is the beam energy, and E D * , p D * , and m D * are the energy, momentum, and mass, of the D * system, respectively. The quantities m B and |p B | are the nominal B meson mass [26] and momentum, respectively. All quantities are evaluated in the Υ (4S ) rest frame. Correctly reconstructed B → D ( * ) ν decays are expected to have a value of cos θ B,D ( * ) between −1 and +1. Correctly reconstructed as well as misreconstructed B → D ( * ) τ ν decays generally have cos θ B,D ( * ) values below −1 due to the presence of additional missing particles. To account for detector resolution effects we apply the requirement −2.0 < cos θ B,D ( * ) < 1.0 for the B tag .
In each event with a selected B tag candidate, we search for the opposite-flavor signature D ( * ) among the remaining tracks and calorimeter clusters, since we only reconstruct pure leptonic tau decays τ → νν. We define four disjoint data samples, denoted D + − , D 0 − , D * + − , and D * 0 − . Charged particle tracks are reconstructed with the SVD and CDC by requiring a point of closest approach to the interaction point smaller than 5.0 cm along the direction of the e + beam and 2.0 cm in the direction perpendicular to it. These requirements do not apply to the pions from K 0 S decays. Electrons are identified by a combination of the specific ionization (dE/dx) in the CDC, the ratio of the cluster energy in the ECL to the track momentum measured with the CDC, the response of the ACC, the cluster shape in the ECL, and the match between positions of the cluster and the track at the ECL. To recover bremsstrahlung photons from electrons, we add the four-momentum of each photon detected within a cone of 0.05 rad of the original track direction to the electron momentum. Muons are identified by the track penetration depth and hit distribution in the KLM. Charged kaons are identified by combining information from the dE/dx measured in the CDC, the flight time measured with the TOF, and the response of the ACC. We do not apply any particle identification criteria for charged pion candidates.
Candidate K 0 S mesons are formed by combining two oppositely charged tracks with pion mass hypotheses. We require their invariant mass to lie within ±15 MeV/c 2 of the nominal K 0 mass [26], which corresponds to approximately seven times the reconstructed mass resolution. Further selection is performed with an algorithm based on a neural network [27].
Photons are measured as an electromagnetic cluster in the ECL with no associated charged track. Neutral pions are reconstructed in the π 0 → γγ channel, and their energy resolution is improved by performing a massconstrained fit of the two photon candidates to the nominal π 0 mass [26]. For neutral pions from D decays, we require the daughter photon energies to be greater than 50 MeV and their asymmetry to be less than 0.6 in the laboratory frame, the cosine of the angle between two photons to be greater than zero, and the γγ invariant mass to be within [−15, +10] MeV/c 2 of the nominal π 0 mass, which corresponds to approximately ±1.8 times the resolution. Low-energy π 0 candidates from D * are reconstructed using less restrictive energy requirements: one photon must have an energy of at least 50 MeV, while the other must have a minimum energy of 20 MeV. We also require a narrower window around the diphoton invariant mass to compensate for the lower photon-energy require-ment: within 10 MeV/c 2 of the nominal π 0 mass, which corresponds to approximately ±1.6 times the resolution.
Neutral D mesons are reconstructed in the following decay modes: D 0 → K − π + π 0 , K − π + π + π − , K − π + , K 0 S π + π − , K 0 S π 0 , K 0 S K + K − , K + K − , and π + π − . Similarly, charged D mesons are reconstructed in the following modes: D + → K − π + π + , K 0 S π + π 0 , K 0 S π + π + π − , K 0 S π + , K − K + π + , and K 0 S K + . The combined branching fractions for reconstructed channels are 30% and 22% for D 0 and D + , respectively. For D decays without a π 0 in the final state, we require the invariant mass of the reconstructed candidates to be within 15 MeV/c 2 of the nominal D 0 or D + mass, which corresponds to a window of approximately ±2.8 times the resolution. In the case of channels with a π 0 in the final state, which have worse mass resolution, we require a wider window: from −45 to +30 MeV/c 2 around the nominal D 0 mass, and from −36 to +24 MeV/c 2 around the nominal D + mass. These windows correspond to approximately [−1.1, +1.6] and [−1.0, +1.4] times the resolution, respectively. Candidate D * + mesons are reconstructed in the channels D 0 π + and D + π 0 , and D * 0 in the channel D 0 π 0 . We do not consider the D * 0 → D 0 γ decay channel due to its higher background level.
We require the mass difference D * − D be within 2.5 MeV/c 2 for the D * + → D 0 π + decay mode, and within 2.0 MeV/c 2 for the D * + → D + π 0 and D * 0 → D 0 π 0 decay modes. These windows correspond to ±3.0 and ±1.9 times the resolution, respectively. We require a tighter mass window in the D * modes that contain a lowmomentum ("slow") π 0 to suppress the large background arising from misreconstructed neutral pions.
On the signal side, we require cos θ B,D ( * ) to be less than 1.0 and the D ( * ) momentum in the Υ (4S ) rest frame to be less than 2.0 GeV/c. Finally, we require that events contain no extra prompt charged tracks, K 0 S candidates, or π 0 candidates, which are reconstructed with the same criteria as those used for the D candidates. All selection criteria used for event reconstruction have been the subject of optimization studies. When multiple B tag or B sig candidates are found in an event, we first select the B tag candidate with the highest tagging classifier output, and then the B sig candidate with the highest p-value from the vertex fit of the B candidate's charm daughter.
To distinguish signal and normalization events from background processes, we use the sum of the energies of neutral clusters detected in the ECL that are not associated with any reconstructed particles, denoted as E ECL . To mitigate the varying effects of photons related to beam background in the calculation of E ECL , we only include clusters with energies greater than 50, 100, and 150 MeV, respectively, from the barrel, forward, and backward ECL regions [18]. Signal and normalization events peak near zero in E ECL , while background events populate a wider range. We require that E ECL be less than 1.2 GeV.
To separate reconstructed signal and normalization events, we employ a BDT based on the XGBoost package [28]. The input variables to the BDT are cos θ B,D ( * ) ; the approximate missing mass squared m 2 is the four-momentum of particle i. We do not apply any selection on the BDT classifier output, denoted as O cls ; instead we use it as one of the fitting variables for the extraction of R(D ( * ) ). Signal events have O cls values near 1, while normalization events have values near 0.
We extract the yields of signal and normalization modes from a two-dimensional (2D) extended maximumlikelihood fit to the variables O cls and E ECL . The fit is performed simultaneously to the four D ( * ) samples, and exploits the isospin constraint R(D ( * )0 ) = R(D ( * )+ ). The distribution of each sample is described as the sum of several components: D ( * ) τ ν, D ( * ) ν, feed-down from D * (τ )ν to D (τ )ν, D * * (τ )ν, and other backgrounds. The PDFs of these components are determined from MC simulations as 2D histogram templates. A large fraction of B → D * ν decays from both B 0 and B + are reconstructed in the D samples (denoted feed-down). We leave these two contributions free in the fit and use their fitted yields to correct the MC estimated feed-down rate of B → D * τ ν decays. The events of the D * samples that appear as feed-down are treated as a component of the signal or normalization yields. As the probability of B → D (τ )ν decays contributing to the D * samples is very small, the relative rates of these contributions are fixed to the MC expected values.
The free parameters in the final fit are the yields of signal, normalization, B → D * * ν , and feed-down from D * to D components. The yields of other backgrounds are fixed to their MC expected values. The ratios R(D ( * ) ) are given by the formula: where ε sig(norm) and N sig(norm) are the detection efficiency including tagging efficiency and yields of signal (normalization) modes and B(τ − → −ν ν τ ) is the average of the world-average branching fractions for = e and = µ.
To improve the accuracy of the MC simulation, we apply a series of correction factors determined from control sample measurements, such as those associated to lepton and hadron identification efficiencies as well as slow pion tracking efficiencies. Correction factors for the lepton efficiencies are evaluated as a function of the lepton momentum and direction using e + e − → e + e − + − and J/ψ → + − decays. Furthermore, to determine the expected yield of fake and misreconstructed D ( * ) mesons, treated as background, we use data sidebands of difference between their nominal and reconstructed mass, and we correct for differences in the reconstruction efficiency of the tagging algorithm between data and MC simulation.
The E ECL projections of the fit are shown in Fig. 1. The fit finds R(D) = 0.307 ± 0.037 and R(D * ) = 0.283 ± 0.018, where the error is statistical.
To estimate various systematic uncertainties contributing to R(D ( * ) ), we vary each fixed parameter 500 times, sampling from a Gaussian distribution built using the value and uncertainty of the parameter. For each variation, we repeat the fit. The associated systematic uncertainty is taken as the standard deviation of the resulting distribution of fitted results. The systematic uncertainties are listed in Table I.
In Table I the label "D * * composition" refers to the uncertainty introduced by the branching fractions of the B → D * * ν channels and the decays of the D * * mesons, which are not well known and hence contribute significantly to the total PDF uncertainty. The uncertainties on the branching fraction of B → D * * ν are assumed to be ±6% for D 1 , ±10% for D * 2 , ±83% for D 1 , and ±100% for D * 0 , while the uncertainties on each of the D * * decay branching fractions are conservatively assumed to be

±100%.
A large systematic uncertainty arises from the limited size of the MC samples. Firstly, this is reflected in the uncertainty of the PDF shapes. To estimate this contribution, we recalculate PDFs for signal, normalization, fake D ( * ) events, B → D * * ν , feed-down, and other backgrounds by generating toy MC samples from the nominal PDFs according to Poisson statistics, and then repeating the fit with the new PDFs. Secondly, the reconstruction efficiency of feed-down events, together with the efficiency ratio of signal to normalization events, are varied within their uncertainties, which are limited by the size of the MC samples as well.
The efficiency factors for the fake D ( * ) and B tag reconstruction are calibrated using collision data. The uncertainties on these factors are affected by the size of the samples used in the calibration. We vary the factors within their errors and extract associated systematic uncertainties.
The effect of the lepton efficiency and fake rate, as well as that due to the slow pion efficiency, do not cancel out in the R(D ( * ) ) ratios. This is due to the dif- ferent momentum spectra of leptons and charm mesons in the normalization and signal modes. The uncertainties introduced by these factors are included in the total systematic uncertainty. We include minor systematic contributions from other sources: one related to the parameters that are used for re-weighting the semileptonic B → D ( * ) ν and B → D * * ν decays; and others from the integrated luminosity, the B production fractions at the Υ(4S), f +− and f 00 , and the branching fractions of B → D ( * ) ν, D, D * and τ − → −ν ν τ decays [26]. The total systematic uncertainty is estimated by summing the aforementioned contributions in quadrature.
In conclusion, we have measured the ratios R(D ( * ) ) = B(B → D ( * ) τ −ν τ )/B(B → D ( * ) −ν ), where denotes an electron or a muon, using a semileptonic tagging method and a data sample containing 772×10 6 BB events collected with the Belle detector. The results are R(D) = 0.307 ± 0.037 ± 0.016 (4) R(D * ) = 0.283 ± 0.018 ± 0.014, where the first uncertainties are statistical and the second are systematic. These results are in agreement with the SM predictions within 0.2σ and 1.1σ, respectively. The combined result agrees with the SM predictions within 0.8σ. This work constitutes the most precise measurements of R(D) and R(D * ) performed to date and the first result for R(D) based on a semileptonic tagging method. The results of this analysis, together with the most recent Belle results on R(D) and R(D * ) ( [12,14]) obtained using a hadronic tag, are combined to provide the Belle combination, which yields R(D) = 0.326 ± 0.034, R(D * ) = 0.283 ± 0.018 with a correlation equal to −0.47 between the R(D) and R(D *  ) values. This combined result is in agreement with the SM predictions within 1.6 standard deviations.