Measurement of the $\tau$ lepton polarization and $R(D^*)$ in the decay $\bar{B} \rightarrow D^* \tau^- \bar{\nu}_\tau$ with one-prong hadronic $\tau$ decays at Belle

With the full data sample of $772 \times 10^6$ $B{\bar B}$ pairs recorded by the Belle detector at the KEKB electron-positron collider, the decay $\bar{B} \rightarrow D^* \tau^- \bar{\nu}_\tau$ is studied with the hadronic $\tau$ decays $\tau^- \rightarrow \pi^- \nu_\tau$ and $\tau^- \rightarrow \rho^- \nu_\tau$. The $\tau$ polarization $P_\tau(D^*)$ in two-body hadronic $\tau$ decays is measured, as well as the ratio of the branching fractions $R(D^{*}) = \mathcal{B}(\bar {B} \rightarrow D^* \tau^- \bar{\nu}_\tau) / \mathcal{B}(\bar{B} \rightarrow D^* \ell^- \bar{\nu}_\ell)$, where $\ell^-$ denotes an electron or a muon. Our results, $P_\tau(D^*) = -0.38 \pm 0.51 {\rm (stat)} ^{+0.21}_{-0.16} {\rm (syst)}$ and $R(D^*) = 0.270 \pm 0.035{\rm (stat)} ^{+0.028}_{-0.025}{\rm (syst)}$, are consistent with the theoretical predictions of the Standard Model. The polarization values of $P_\tau(D^*)>+0.5$ are excluded at the 90\% confidence level.


I. INTRODUCTION
Semileptonic B decays to τ leptons (semitauonic decays) are theoretically well-studied processes within the standard model (SM) [1][2][3], where the decay process is represented by the tree-level diagram shown in Fig. 1. The τ lepton is more sensitive to new physics (NP) beyond the SM that couples strongly with mass. A prominent candidate is the two-Higgs-doublet model (2HDM) [4], where charged Higgs bosons appear. The contribution of the charged Higgs to the decay process B → D ( * ) τ −ν τ [5] is suggested by many theoretical works (for example, Refs. [6][7][8][9][10]).
The denominator is the average of − = e − , µ − for Belle and BABAR, and − = µ − for LHCb. The ratio cancels numerous uncertainties common to the numerator and the denominator; these include the uncertainty in the Cabibbo-Kobayashi-Maskawa matrix element |V cb |, many of the theoretical uncertainties on hadronic form factors (FFs), and experimental reconstruction effects.

II. EXPERIMENTAL APPARATUS
We use the full Υ(4S) data sample containing 772 × 10 6 BB pairs recorded with the Belle detector [42] at the asymmetric-beam-energy e + e − collider KEKB [43]. The Belle detector is a large-solid-angle magnetic spectrometer that consists of a silicon vertex detector (SVD), a 50-layer central drift chamber (CDC), an array of aerogel threshold Cherenkov counters (ACC), a barrellike arrangement of time-of-flight scintillation counters (TOF) and an electromagnetic calorimeter (ECL) comprised of CsI(Tl) crystals located inside a superconducting solenoid coil that provides a 1.5 T magnetic field. An iron flux-return located outside the coil is instrumented to detect K 0 L mesons and to identify muons (KLM). The detector is described in detail elsewhere [42]. Two inner detector configurations were used. A 2.0 cm radius beampipe and a 3-layer SVD were used for the first sample of 152×10 6 BB pairs, while a 1.5 cm radius beampipe, a 4-layer SVD and a small-cell inner drift chamber were used to record the remaining 620 × 10 6 BB pairs [44].

III. MONTE CARLO SIMULATION
The Monte Carlo (MC) simulated events are used to establish the analysis criteria, study the background and estimate the signal reconstruction efficiency. Events with a BB pair are generated using EvtGen [45], and the B meson decays are reproduced based on branching fractions reported in Ref. [46]. The hadronization process of the B meson decay with no experimentallymeasured branching fraction is inclusively reproduced by Pythia [47]. For continuum e + e − → qq (q = u, d, s, c) events, hadronization of the initial quark pair is described by Pythia, and hadron decays are modeled by EvtGen. Final-state radiation from charged particles is added using Photos [48]. Detector responses are reproduced by the Belle detector simulator based on Geant3 [49]. The MC samples used in this analysis are described below.
The MC sample for the normalization mode (B → D * −ν ) is generated based on HQET. Since the FF parameters used for the production of the normalization MC sample have been updated as described above, final-state kinematics are corrected to match the latest parameter values.
where D * * denotes the excited charm meson states heavier than D * , comprise an important background category as they have a similar decay topology to the signal events. The MC sample forB → D * * −ν is generated based on the Isgur-Scora-Grinstein-Wise (ISGW) model [53], and decay kinematics are corrected to match the Leibovich-Ligeti-Stewart-Wise (LLSW) model [54]. The branching fractions forB → D * * −ν with D * * = D * 0 , D 1 , D 1 and D * 2 are taken from the world averages [20]. For the D * * decays, in addition to experimentally-measured modes, we allow unmeasured final states consisting of a D ( * ) and one or two pions, a ρ meson, or an η meson based on quantum-number, phase-space and isospin considerations. The radially-excited D ( * ) (2S) modes are included so that the total branching fraction ofB → D * * −ν becomes about 3%, which is expected from the difference between B(B → X c −ν ) (where X c denotes all the possible charmed-meson states) and the sum of the exclusive branching fractions of B(B → D ( * ) −ν ). TheB → D * * τ −ν τ MC sample is generated using the ISGW model. We take the branching fractions from the theoretical [55]. We use the average R(D * * ) of the four approximations discussed in Ref [55]. We do not considerB → D ( * ) (2S)τ −ν τ or other semitauonic modes containing a charmed state heavier than D ( * ) (2S) as their small phase space suppresses the branching fractions.

Other background:
The MC samples for other background processes, both BB events and continuum e + e − → qq events, are generated based on the past experimental studies reported in Ref. [46]. Unmeasured decay channels are generated with Pythia through the inclusive hadronization process.

A. Reconstruction of the tag side
We conduct the analysis by first identifying events where one of the two B mesons (B tag ) is reconstructed in one of 1104 exclusive hadronic B decays [56]. A hierarchical multivariate algorithm based on the NeuroBayes neural-network package is employed. More than 100 input variables are used to determine well-reconstructed B candidates, including the difference between the energy of the reconstructed B tag candidate and the beam energy in the e + e − center-of-mass (CM) frame ∆E ≡ E * tag − E * beam , as well as the event shape variables for suppression of e + e − → qq background. The quality of the B tag candidate is synthesized in a single Neu-roBayes output-variable classifier (O NB ). We require the beam-energy-constrained mass of the B tag candidate where p * tag is the reconstructed B tag three-momentum in the CM frame, to be greater than 5.272 GeV and the value of ∆E to be between −150 and 100 MeV. Throughout the paper, natural units with = c = 1 are used. We place a requirement on O NB such that about 90% of true B tag and about 30% of fake B tag candidates are retained. If two or more B tag candidates are retained in one event, we select the one with the highest O NB .
Due to limited knowledge of hadronic B decays, the branching fractions of the B tag decay modes are not perfectly modeled in the MC simulation. It is therefore essential to calibrate the B tag reconstruction efficiency (tagging efficiency) with control data samples. We determine a scale factor for each B tag decay mode using events where the signal-side B meson candidate (B sig ) is reconstructed inB → D ( * ) −ν modes. Further details of the calibration method are described in Ref. [57]. The ratio of measured to expected rates in each decay mode ranges from 0.2 to 1.4, depending on the B tag decay mode, and is 0.72 on average. After the efficiency calibration, the tagging efficiencies are estimated to be about 0.20% for charged B mesons and 0.15% for neutral B mesons.

Particle selection
First, daughter particles of D * and τ (K ± , π ± , K 0 S , γ, π 0 and ρ ± ) and charged leptons (e ± and µ ± ) are reconstructed. For B sig reconstruction, we use different particle selections from those applied for the B tag reconstruction described in Ref. [56].
Charged particles are reconstructed using the SVD and the CDC. All tracks, except for K 0 S -daughter candidates, are required to have dr < 0.5 cm and |dz| < 2.0 cm, where dr and |dz| are the impact parameters to the interaction point (IP) in the directions perpendicular and parallel, respectively, to the e + beam axis. Chargedparticle types are identified by a likelihood ratio based on the responses of the sub-detector systems. Identification of K ± and π ± candidates is performed by combining measurements of specific ionization (dE/dx) in the CDC, the time of flight from the IP to the TOF counter and the photon yield in the ACC. For τ -daughter π ± candidates, an additional proton veto is required in order to reduce background from baryonic B decays such asB → D * p n. The ECL electromagnetic shower shape, track-to-cluster matching at the inner surface of the ECL, dE/dx in the CDC, the photon yield in the ACC and the ratio of the cluster energy in the ECL to the track momentum measured with the SVD and the CDC are used to identify e ± candidates [58]. Muon candidates are selected based on their penetration range and transverse scattering in the KLM [59]. To form K 0 S candidates, we combine pairs of oppositely-charged tracks, treated as pions. Standard Belle K 0 S selection criteria are applied [60]: the reconstructed vertex must be detached from the IP, the momentum vector must point back to the IP, and the invariant mass must be within ±30 MeV of the nominal K 0 S mass [46], which corresponds to about 8σ. (In this section, σ denotes the corresponding mass resolution.) Photons are reconstructed using ECL clusters not matching to charged tracks. Photon energy thresholds of 50, 100 and 150 MeV are used in the barrel, forwardendcap and backward-endcap regions, respectively, of the ECL to reject low-energy background photons, such as those originating from the e + e − beams and hadronic interactions of particles with materials in the detector.
Neutral pions are reconstructed in the decay π 0 → γγ. For π 0 candidates from D or ρ decay, referred to as normal π 0 s, we impose the same photon energy thresholds described above. The π 0 candidate's invariant mass must lie between 115 and 150 MeV, corresponding to about ±3σ around the nominal π 0 mass [46]. In order to reduce the number of fake π 0 candidates, we apply the following π 0 candidate selection procedure. The π 0 candidates are sorted in descending order according to the energy of the most energetic daughter. If a given photon is the most energetic daughter of two or more candidates, they are sorted by the energy of the lower-energy daughter. We then retain the π 0 candidates whose daughter photons are not shared with a higher-ranked candidate. In this criterion, 76% of the correctly reconstructed π 0 candidates are selected while 54% of the fake π 0 candidates are removed. The retained π 0 candidates are used for D and ρ reconstruction described later.
For the soft π 0 from D * decay, we impose a relaxed photon energy threshold of 22 MeV in all ECL regions and the same requirement for the invariant mass of the two photons. Additionally, the energy asymmetry A π 0 = (E h −E l )/(E h +E l ) is required to be less than 0.6, where E h and E l are the energies of the high-and low-energy photon daughters in the laboratory frame. Here, we do not apply the normal-π 0 candidate selection procedure.
The ρ candidate is formed from the combination of a π ± and a π 0 . The candidate invariant mass must lie between 0.66 and 0.96 GeV.

D ( * ) reconstruction
After reconstructing the light mesons, we reconstruct the D candidates in 15 decay modes. The D invariant mass requirements are optimized for each decay mode. For the D 0 modes used in forming D * 0 candidates, the reconstructed invariant masses (M D ) are required to be within ±2.0σ (±1.5σ) of the nominal D 0 meson mass [46] for the high (low) signal-to-noise ratio (SNR) modes. For D * + → D 0 π + candidates, the M D requirements are loosened to ±4.0σ and ±2.0σ for the high-and low-SNR modes, respectively. The requirements for the D + candidates are ±2.5σ for the high-SNR modes and ±1.5σ for the low-SNR modes around the nominal D + meson mass [46]. Here, the high-SNR modes are the low-SNR modes are all remaining D modes. We reconstruct D * candidates by combining a D candidate with a π ± , γ, or soft π 0 . The D * candidates are selected based on the mass difference ∆M ≡ M D * −M D , where M D * denotes the reconstructed invariant mass of the D * candidate. The D * 0 → D 0 γ, D * 0 → D 0 π 0 , D * + → D + π 0 , and D * + → D 0 π + candidates are required to have ∆M within ±1.5σ, ±2.0σ, ±2.0σ and ±3.5σ, respectively, of the nominal ∆M .

Bsig selection
The B sig candidates are formed by associating a τdaughter meson (signal events) or a − (normalization events) with a D * candidate. Allowed combinations are We select one of the following B meson combinations: . For the signal mode, if at least one possible candidate for the signal mode is found in an event, we calculate cos θ hel in the rest frame of the τ . Although this frame cannot be determined completely, equivalent kinematic information is obtained using the rest frame of the τ −ν τ system. This frame is obtained by boosting the laboratory frame along with the three-momentum vector component of the momentum transfer where p denotes the four-momentum of the e + e − beam, B tag , and D * , respectively. In this frame, the energy and the magnitude of the momentum of the τ lepton are determined only by q 2 as where m τ is the τ lepton mass. The cosine of the angle between the momenta of the τ lepton and its daughter meson is determined by where E τ (d) and p τ (d) denote the energy and the momentum of the τ lepton (the τ daughter d) respectively, and m d is the mass of the τ daughter. Through a Lorentz transformation from the rest frame of the τ −ν τ system to the τ rest frame, the following relation is obtained: where | p τ d | = (m 2 τ − m 2 d )/2m τ is the τ -daughter momentum in the rest frame of τ , and γ = E τ /m τ and | β| = | p τ |/E τ . Solving gives the value of cos θ hel . Events are required to lie in the physical region of | cos θ hel | < 1, where 97% of the reconstructed signal events are retained. As shown in Fig. 2, there is a significant background peak near 1 in the τ − → π − ν τ sample due to thē B → D * −ν background. To reject this background, we only use the region cos θ hel < 0.8 in the fit to the τ − → π − ν τ sample.
Due to the kinematic constraint that q 2 must be greater than m 2 τ , almost no signal events exist with q 2 below 4 GeV 2 . Therefore q 2 > 4 GeV 2 is required. The variable E ECL is the linear sum of the energy of ECL clusters not used in the event reconstruction. The ECL clusters satisfying the photon-energy requirement defined in the previous section are added to E ECL . Signal events ideally have E ECL equal to zero with a tail in the E ECL distribution from the beam background and splitoff showers, separated from the main ECL cluster and reconstructed as photon candidates. We require E ECL to be less than 1.5 GeV.
For the normalization mode, we calculate the squared missing mass, where p denotes the four-momentum of the charged lepton and the other variables were defined earlier. The normalization events populate the region near M 2 miss = 0 GeV 2 because there is exactly one neutrino in an event. We require −0.5 < M 2 miss < 0.5 GeV 2 . We further require E ECL to be less than 1.5 GeV.
Finally, for both the signal and the normalization events, we require that there be no extra charged tracks with dr < 5 cm and |dz| < 20 cm, and normal π 0 candidates.

C. Best candidate selection
After event reconstruction, the average number of retained candidates per event is about 1.09 for charged B mesons and 1.03 for neutral B mesons. In events where two or more candidates are reconstructed, 2.1 candidates are found on average. Multiple-candidate events mostly arise from more than one combination of a D candidate with photons or soft pions. For the charged B mode, about 2% of the events are reconstructed both in the D * 0 → D 0 γ and D * 0 → D 0 π 0 modes. Since the latter mode has a much higher branching fraction, we assign these events to the D * 0 → D 0 π 0 sample. The contribution of this type of multiple-candidate events is negligibly small in the neutral B mode. We then select the most signal-like candidate as follows. For the D * 0 → D 0 γ events, we select the candidate with the most energetic photon associated with the D 0 . For the D * 0 → D 0 π 0 and D * + → D + π 0 events, we select the candidate with the soft π 0 that has an invariant mass nearest the nominal π 0 mass. For the D * + → D 0 π + events, we select one candidate at random since the multiple-candidate probability is only O(0.01%). After the D * candidate selection, roughly 2% of the retained events are reconstructed both in the τ − → π − ν τ and the τ − → ρ − ν τ samples. Since the MC study indicates that about 80% of such events originate from the τ − → ρ − ν τ decay, we assign these events to the τ − → ρ − ν τ sample.

D. Sample composition
The reconstructed events are categorized in turn as below. Based on this categorization, we construct histogram probability density functions (PDFs) from the MC samples to perform a final fit.

Signal:
Correctly reconstructed signal events that originate from τ − → π − (ρ − )ν τ events are categorized in this component. The yield is treated as a free parameter determined by R(D * ) and P τ (D * ).
Other τ cross feed: B → D * τ −ν τ events with other τ decay modes also contribute to the signal sample. They originate mainly from τ − → a − 1 (→ π − π 0 π 0 )ν τ with one or two missing π 0 , or τ − → µ −ν µ ν τ with a lowmomentum µ − that does not reach the KLM. These two modes occupy about 80% of this component. The MC study shows that the cross feed events both from τ − → a − 1 ν τ and τ − → µ −ν µ ν τ have negligible impact on our P τ (D * ) measurement. In the fit, the yield of this category is determined by R(D * ).
The decayB → D * −ν contaminates the signal sample due to misassignment of − as π − . We fix theB → D * −ν yield in the signal sample from the fit to the M 2 miss distribution of the normalization sample.
B → D * * −ν and hadronic B decays: TheB → D * * −ν (B → D * * τ −ν τ is also included in this category) and hadronic B decays are the most uncertain component due to limited experimental knowledge. By missing a few particles such as π 0 mesons, the event topology resembles the signal event. We combine these decay modes into one component. The fractions of theB → D * * −ν decays and hadronic B decays are about 10% and 90%, respectively, according to the MC study. Since it is difficult to estimate the yield of this component using MC simulation or to fix the yield using control data samples, we float the yield in the final fit. One exception is the collection of modes with two charm mesons such asB → D * D ( * )− s andB → D * D( * ) K − . Since the branching fractions of these modes have been studied experimentally, we fix their yield using the MC expectation after correction with the branching fractions based on Ref. [46].

Continuum:
Continuum events from the e + e − → qq process provide a minor contribution at O(0.1%) in the signal sample. We fix the yield using the MC expectation.
Fake D * : All events containing fake D * candidates are categorized in this component. This is the main background source in the charged B meson sample. For the neutral B sample, many D * + candidates are reconstructed from the combination of a D 0 with a π ± and therefore much more cleanly reconstructed than the other D * modes with π 0 or γ. The yield is determined from a comparison of the data and the MC sample in the ∆M sideband regions.
E. Measurement Method of R(D * ) and Pτ (D * ) We use the following variables to measure yields of the signal and the normalization modes. For the normalization mode, M 2 miss is the most suitable variable due to its high purity. On the other hand, the shape of the M 2 miss distribution for the signal mode has a strong correlation with P τ (D * ). To measure the signal yield, we use E ECL because it has a small correlation to P τ (D * ) and provides good discrimination between the signal and the background modes.
The value of R(D * ) is measured using the formula where B i τ denotes the relevant τ branching fraction, and ij sig and j norm (N ij sig and N j norm ) are the efficiencies (the observed yields) for the signal and the normalization modes, respectively. The indices i and j represent the τ decays (τ − → π − ν τ or ρ − ν τ ) and the B charges (charged B or neutral B), respectively. Assuming isospin symmetry, we use R(D * ) = R(D * 0 ) = R(D * + ).
The value of P τ (D * ) is determined using the formula where N F (B)ij sig denotes the signal yield in the region cos θ hel > (<) 0 and satisfies N F ij sig + N Bij sig = N ij sig . This formula is obtained by calculating The differential decay rate dΓ ij (D ( * ) )/d cos θ hel is given by Eq. (3). As with R(D * ), we use the common parameters P τ (D * ) = P τ (D * 0 ) = P τ (D * + ). Due to detector efficiency effects, the measured polarization, P raw τ (D * ), is biased from the true value of P τ (D * ). To correct for this bias, we form a linear function that maps P τ (D * ) to P raw τ (D * ) using several MC sets with different P τ (D * ). This function, denoted the P τ (D * ) correction function, is separately prepared for each τ sample since the detector bias depends on the given τ mode. We also make a P τ (D * ) correction function for the ρ ↔ π cross feed component to take into account the distortion of the cos θ hel distribution shape. In the P τ (D * ) correction, other kinematic distributions are assumed to be consistent with the SM predictions.

V. BACKGROUND CALIBRATION AND PDF VALIDATION
To use the MC distributions as histogram PDFs, the MC simulation needs to be verified using calibration data samples. In this section, the calibration of the PDF shapes is discussed.

A. Signal PDF shape
To validate the E ECL shape of the signal component, we use the normalization mode as the control sample. It has similar E ECL properties to the signal component; there is no extra photon from the B sig decay except for bremsstrahlung photons, and therefore the E ECL shape is mostly determined by the background photons. The normalization sample contains about 50 times more events than the expected signal yield. Figure 3 shows a comparison of E ECL between data and MC simulation. The pull of each bin is shown in the bottom panel; hereinafter, the pull in the ith bin is defined as where N i data(MC) and σ i data(MC) denote the number of events and the statistical error, respectively, in the ith bin of the data (MC) distribution. The fake D * yield is scaled based on the calibration discussed in the next section. Since the contribution from the other background components is negligibly small, it is fixed to the MC expectation. The ECL shape in the MC sample agrees well with the data within statistical uncertainty.

B. Fake D * events
One of the most significant background components arises from fake D * candidates. The combinatorial fake D * background processes are difficult to model precisely in the MC simulation. The E ECL shapes for the data and the MC sample are compared using ∆M sideband regions of 50-500 MeV, 135-190 MeV, 135-190 MeV, and 140-500 MeV for D * 0 → D 0 γ, D * 0 → D 0 π 0 , D * + → D + π 0 , and D * + → D 0 π + , respectively; each excludes about ±4σ around the ∆M peak. These sideband regions contain 5 to 50 times more events than the signal region. Figure 4(a) shows the comparison of the E ECL shapes. Although all the D * and τ modes are combined in these figures, the E ECL shape has been compared in 16 subsamples of B modes, D * modes, τ modes, and the two cos θ hel regions. We find good agreement of the E ECL shape within the statistical uncertainty of these mass sideband data samples. We also check the cos θ hel distribution in the ∆M sideband region, as shown in Figs. 4(b) and 4(c). The cos θ hel distribution in the MC simulation  also shows good agreement with the data within the statistical uncertainty. In both the signal and the normalization samples, yield discrepancies of up to 20% are observed. The fake D * yields in the signal region of the MC simulation are scaled by the yield ratios of the data to the MC sample in the ∆M sideband regions.
C.B → D * * −ν and hadronic B composition As discussed in Sec. IV D, the yield of theB → D * * −ν and hadronic B background component is deter-mined in the final fit. The PDF shape of this background must be corrected with data, as a change in the B decay composition may modify the E ECL shape and thereby introduce bias in the measurements of R(D * ) and P τ (D * ).
If a background B decay contains a K 0 L in the final state, it may peak in the E ECL signal region. We correct the branching fractions of theB → D * π − K 0 L and B → D * K − K 0 L modes in the MC simulation using the measured values [46,61]. We do not apply branching fraction corrections for the other decays with K 0 L because they have relatively small expected yields. However, we assume 100% of the uncertainty on the branching fractions to estimate systematic uncertainties, as discussed   Other types of hadronic B decay background often contain neutral particles such as π 0 or η as well as pairs of charged particles. We calibrate the rate of hadronic B decays in the signal region based on control samples where one B is fully reconstructed with the hadronic tag, and the signal side is reconstructed in seven final states (B → D * π − π − π + ,B → D * π − π − π + π 0 , B → D * π − π − π + π 0 π 0 ,B → D * π − π 0 ,B → D * π − π 0 π 0 , B → D * π − η, andB → D * π − ηπ 0 ). Charged and neutral B mesons are reconstructed separately. Pairs of photons with an invariant mass ranging from 500 to 600 MeV are selected as η candidates. We then extract the yield of the data and the MC sample in the region q 2 > 4 GeV 2 and | cos θ hel | < 1, which is the same requirement as in the signal sample. To calculate cos θ hel , we assume that (one of) the charged pion(s) is the τ daughter. The signal-side energy difference ∆E sig or the beam-energy-constrained mass M sig bc of the B sig candidate is used for the yield extraction. Figure 5 shows the M sig bc distribution for the B − → D * π − η mode as an example. We estimate yield calibration factors by taking ratios of the yields in the data to that in the MC sample. If there is no observed signal event in the calibration sample, we assign a 68% confidence level (C.L.) upper limit on the yield. The obtained calibration factors are summarized in Table I. Additionally, we correct the branching fractions of the decays B − → D * + π − π − π 0 ,B → D * ωπ − andB → D * p n based on Refs. [46,62].
About 80% of the hadronic B background is covered by the calibrations discussed above. We estimate the systematic uncertainties on our observables due to the uncertainties of the calibration factors in Sec. VII. In the fake D * 0 component of the charged B channel, as shown in Fig. 6(a), we observe a slight discrepancy between the data and the MC sample. The M 2 miss discrepancy is therefore corrected based on this comparison. The M 2 miss distribution after the correction is shown in Fig. 6(b). The yield of the fake D * component is also corrected with the same method as applied to the signal sample.
After the correction for the fake D * component, we find that the M 2 miss resolution of the data sample is 10 to 20% worse than that of the MC sample. We therefore smear the M 2 miss peak width to match that of the data sample. The correction is performed separately for each D * mode.

VI. MAXIMUM LIKELIHOOD FIT
An extended binned maximum likelihood fit is performed in two steps; we first perform a fit to the normalization sample to determine its yield, and then a simultaneous fit to eight signal samples from combinations of (B − ,B 0 ), (π − ν τ , ρ − ν τ ) and (cos θ hel > 0, cos θ hel < 0). In the fit, R(D * ) and P τ (D * ) are common fit parameters among all the signal samples, while theB → D * * −ν and hadronic B yields are free to float. Figure 7 shows the fit result to the normalization sample. The p-value calculated from the agreement between the data and the fitted PDFs is 0.15. The normalization yields are measured to be 4711 ± 81 events for the charged B sample and 2502 ± 52 events for the neutral B sample, where the errors are statistical. As a cross check, we obtain the branching fractions of (10.72 ± 0.70)% for B − → D * 0 −ν and (10.60 ± 0.75)% forB 0 → D * + −ν , where the values are the sum ofB → D * e −ν e andB → D * µ −ν µ . The error includes only a partial set of systematic uncertainties. These are consistent with the world averages B(B − → D * 0 −ν ) = (11.18 ± 0.04 ± 0.38)% and B(B 0 → D * + −ν ) = (9.75 ± 0.02 ± 0.20)%, respectively [63].
The fit to the signal samples is performed as shown in Fig. 8, with a p-value of 0.29. The signal yields for the charged and the neutral B samples are 210 ± 27 and 88 ± 11, respectively, where the errors are statistical. The observables R(D * ) and P τ (D * ) are obtained using Eqs. (10) and (11) for the correctly reconstructed signal events. The ρ ↔ π cross feed yield is constrained by R(D * ) and P τ (D * ). The other cross feed yield is determined only by R(D * ). The efficiency ratios for the correctly reconstructed signal events are norm / sig = 0.97 ± 0.02 for the charged B mode and 1. (16) Figure 9 shows the projections of the fit results in q 2 , M 2 miss , | p * π | and | p * ρ |, where p * π(ρ) is the momentum of the τ -daughter π (ρ) in the CM frame. Each PDF component is scaled based on the yield obtained from the fit. All the panels show good agreement between the data and the expectation from the MC simulation.

VII. SYSTEMATIC UNCERTAINTIES
We estimate systematic uncertainties by varying each possible uncertainty source (such as the PDF shape and the signal reconstruction efficiency) with the assumption of a Gaussian error, unless stated otherwise. In several trials, we change each parameter at random, repeat the fit, and take the shifts of values of R(D * ) and P τ (D * ) from all such trials as the corresponding systematic uncertainty that is enumerated in Table II.
The most significant systematic uncertainty, arising from the hadronic B decay composition, is estimated as follows. Uncertainties of each B decay fraction in the hadronic B decay background are taken from the measured branching fractions or estimated from B 0 , τ -→πν , cosθ hel < 0    The limited MC sample size used in the construction of the PDFs is a major systematic uncertainty source. We estimate this by regenerating the PDFs for each component and each sample using a toy MC approach based on the original PDF shapes. The same number of events are generated to account for the statistical fluctuation.
The PDF shape of the fake D * component has been validated by comparing the data and the MC sample in the ∆M sideband region. However, a slight fluctuation from the decayB → Dτ −ν τ may have a significant impact on the signal yield since this component has almost the same shape as the signal mode, peaking at E ECL = 0 GeV. We incorporate an additional uncertainty by varying the contribution from theB → Dτ −ν τ component within the current uncertainties in the experimental averages [46]: ±32% for B − → D 0 τ −ν τ and ±21% forB 0 → D + τ −ν τ . We take the theoretical uncertainty on the τ polarization of theB → Dτ −ν τ mode into account, which is found to be 0.002 for P τ (D * ) and negligibly small. In addition, we estimate a systematic uncertainty due to the small M 2 miss shape correction for the fake D * component, discussed in Sec. V D. The systematic uncertainties related to the fake D * shape are 3.0% for R(D * ) and 0.008 for P τ (D * ). The fake D * yield, fixed using the ∆M sideband, has an uncertainty that arises from the statistical uncertainties of the yield scale factors. The systematic uncertainties arising from the yield scale factors are 1.6% for R(D * ) and 0.016 for P τ (D * ).
The uncertainties due to the FF parameters in the normalization modeB → D * −ν are estimated using the uncertainties in the world-average values [20]. In addition, the uncertainty arising from the M 2 miss shape correction for the normalization sample is estimated as an uncertainty related toB → D * −ν .
The uncertainties on the reconstruction efficiencies of the τ -daughter particles and the charged leptons arise from the particle identification efficiencies for π ± and ± and the reconstruction efficiency for π 0 . They are measured with control samples: the D * + → D 0 (→ K − π + )π + sample for π ± , the τ − → π − π 0 ν τ sample for π 0 , and the γγ → + − sample for charged leptons. The sample J/ψ → + − from B decays is also used in order to account for the difference in multiplicity between two-photon events and B decay events.
The efficiency uncertainties arising from the MC statistics are varied independently for each component.
Other minor uncertainties arise due to the branching fractions of the τ lepton decays and errors on the parameters of the P τ (D * ) correction function.
In addition, common uncertainty sources between the signal sample and the normalization sample are estimated. Although they largely cancel in R(D * ), there are some residual uncertainties from background components where yields are fixed based on MC expectation. Here, uncertainties on the number of BB and the branching fraction of Υ(4S) → B + B − , B 0B0 (1.8%), tagging efficiencies (4.7%), branching fractions of the D decays (3.4%), and D * reconstruction efficiency (4.8%) are evaluated for their impact on the final measurements. For the D * reconstruction efficiency, the uncertainty originates from reconstruction efficiencies of K 0 S , π 0 , K ± and π ± , and is therefore correlated with the efficiency uncertainty of the τ -daughter particles containing π ± and π 0 . This correlation is taken into account in the total systematic uncertainties shown in Table II.
The covariance matrix C is represented by where ρ tot and σ R(P ) tot denote the total correlation factor and the total uncertainty on R(D * ) [P τ (D * )], respectively. Overall, our result is consistent with the SM prediction. Our measurement of P τ (D * ) excludes the region larger than +0.5 at 90% C.L.
The three results of R(D * ) with the full data sample of Belle are statistically independent. The average R(D * ) measured by Belle is estimated to be 0.292±0.020(stat)± 0.012(syst). In this average, correlation in the uncertainties arising from background semileptonic B decays is taken into account and other uncertainties are regarded as independent. The relative error in the average R(D * ) is 7.5%, which is the most precise result by a single experiment. Compared to the SM prediction [23], the estimated value is 1.7σ higher. Including R(D) measured by Belle [13], compatibility with the SM predictions is 2.5σ, corresponding to a p-value of 0.042.
which are consistent with the SM predictions. The result excludes P τ (D * ) > +0.5 at 90% C.L. This is the first measurement of the τ polarization in the semitaounic decays, providing a new dimension in the search for NP in semitauonic B decays.

ACKNOWLEDGMENTS
We acknowledge Y. Sakaki FIG. 11. Summary of the R(D * ) measurements based on the full data sample of Belle and their average. The inner (outer) error bars show the statistical (total) uncertainty. The shaded band is the world average as of early 2016 [20] while the white band is the SM prediction [23]. On each measurement, the tagging method and the choice of the τ decay are indicated, where "SL tag" is the semileptonic tag and h in the τ decay denotes a hadron h = π or ρ.