Measurement of the ratio of branching fractions $\mathcal{B}(\overline{B}^0 \to D^{*+}\tau^{-}\overline{\nu}_{\tau})/\mathcal{B}(\overline{B}^0 \to D^{*+}\mu^{-}\overline{\nu}_{\mu})$

The branching fraction ratio $\mathcal{R}(D^{*}) \equiv \mathcal{B}(\overline{B}^0 \to D^{*+}\tau^{-}\overline{\nu}_{\tau})/\mathcal{B}(\overline{B}^0 \to D^{*+}\mu^{-}\overline{\nu}_{\mu})$ is measured using a sample of proton-proton collision data corresponding to 3.0\invfb of integrated luminosity recorded by the LHCb experiment during 2011 and 2012. The tau lepton is identified in the decay mode $\tau^{-} \to \mu^{-}\overline{\nu}_{\mu}\nu_{\tau}$. The semitauonic decay is sensitive to contributions from non-Standard-Model particles that preferentially couple to the third generation of fermions, in particular Higgs-like charged scalars. A multidimensional fit to kinematic distributions of the candidate $\overline{B}^0$ decays gives $\mathcal{R}(D^{*}) = 0.336 \pm 0.027(stat) \pm 0.030 (syst)$. This result, which is the first measurement of this quantity at a hadron collider, is 2.1 standard deviations larger than the value expected from lepton universality in the Standard Model.

at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors [14]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillatingpad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [15]. The online event selection is performed by a trigger [16], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.
A simulation of pp collisions is provided by Pythia [17,18] with a specific LHCb configuration [19]. Decays of hadronic particles are described by EvtGen [20], in which final-state radiation is generated using Photos [21]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [22,23] as described in Ref. [24].
The trigger requirements are chosen to avoid the imposition of any p T selection on the muon, or invariant mass requirements on the D * + µ − system, crucial for preserving the distinct kinematic distributions of the B 0 → D * + τ − (→ µ − ν τ ν µ )ν τ decay. Events are required to pass the hardware trigger either because the decay products of the D * + candidate satisfy the hadron trigger requirements or because high-p T particles in the event, independent of the D * + µ − , satisfy one of the hardware trigger requirements. In the software trigger, the events are required to meet criteria designed to accept D 0 → K − π + candidates with p T > 2 GeV/c. Quality requirements are applied to the tracks of the charged particles that originate from a candidate D 0 decay: their momenta must exceed 5 GeV/c and at least one must have p T > 1.5 GeV/c. The momentum vector of the D 0 candidate must point back to one of the PVs in the event and the reconstructed mass must be consistent with the known D 0 mass [25].
In the offline reconstruction, the D 0 candidates satisfying the trigger are further required to have well-identified K − and π + daughters, and the decay vertex is required to be significantly separated from any PV. The invariant mass of the D 0 candidate is required to be within 23.5 MeV/c 2 of the peak value, corresponding to approximately three times the D 0 mass resolution. These candidates are combined with low-energy pions to form candidate D * + → D 0 π + decays, which are subjected to a kinematic and vertex fit to the decay chain. Candidates are then required to have a mass difference ∆m ≡ m(D 0 π + ) − m(D 0 ) within 2 MeV/c 2 of the known value, corresponding to approximately 2.5 times the observed resolution. The muon candidate is required to be consistent with a muon signature in the detector, to have momentum 3 < p < 100 GeV/c, to be significantly separated from the primary vertex, and to form a good vertex with the D 0 candidate. The D * + µ − combinations are required to have an invariant mass less than 5280 MeV/c 2 and their momentum vector must point approximately to one of the reconstructed PV locations, which removes combinatoric candidates while preserving a large fraction of semileptonic decays. In addition to the signal candidates, two independent samples of "wrong sign" candidates, D * + µ + and D 0 π − µ − , are formed for estimating the combinatorial background.
The former represents random combinations of D * + candidates with muons from unrelated decays, and the latter is used to model the contribution of misreconstructed D * + decays. Mass regions 5280 < m(D * + µ − ) < 10000 MeV/c 2 and 139 < ∆m < 160 MeV/c 2 are included in all samples for study of the combinatorial backgrounds. Finally, a sample of candidates is selected where the track paired with the D * + fails all muon identification requirements. These D * + h ± candidates are used to model the background from hadrons misidentified as muons.
To suppress the contributions of partially reconstructed B decays, including B decays to pairs of charmed hadrons, and semileptonic B → D * + (nπ)µ − ν µ decays with n ≥ 1 additional pions, the D * + µ − candidates are required to be isolated from additional tracks in the event. An algorithm is developed and trained to determine whether a given track is likely to have originated from the signal B candidate or from the rest of the event based on a multivariate analysis (MVA) method. For each track in the event, the algorithm employs information on the track separation from the PV, the track separation from the decay vertex, the angle between the track and the candidate momentum vector, the decay length significance of the decay vertex under the hypothesis that the track does not originate from the candidate and the change in this significance under the hypothesis that it does. A signal sample, enriched in B 0 → D * + τ − ν τ and B 0 → D * + µ − ν µ decays, is constructed by requiring that no tracks in the event reach a threshold in the MVA output. In addition, the output is used to select three control samples enriched in partially reconstructed B decays of interest for background studies by requiring that only one or two tracks be selected by the MVA (D * + µ − π − or D * + µ − π + π − ) or that at least one track selected by the MVA passes K ± identification requirements (D * + µ − K ± ). These samples are depleted of B 0 → D * + µ − ν µ and B 0 → D * + τ − ν τ decays and are used to study and constrain the shapes of remaining backgrounds in the signal sample.
The efficiencies ε s and ε n for the signal and the normalization channels, respectively, are determined in simulation. These include the effects of the trigger, event reconstruction, event selection, particle identification procedure, isolation method, and the detector acceptance. To account for the effect of differing detector occupancy distributions between simulation and data, the simulated samples are reweighted to match the occupancy observed in data. The overall efficiency ratio is ε s /ε n = (77.6 ± 1.4)%, with the deviation from unity primarily due to the particle identification, which dominantly removes low-p T muon candidates, and vertex quality requirements.
The separation of the signal from the normalization channel, as well as from background processes, is achieved by exploiting the distinct kinematic distributions that characterize the various decay modes, resulting from the µ − τ mass difference and the presence of extra neutrinos from the decay τ − → µ − ν µ ν τ . The most discriminating kinematic variables are the following quantities, computed in the B rest frame: the muon energy, E * µ ; the missing mass squared, defined as m 2 miss = (p µ B −p µ D −p µ µ ) 2 ; and the squared four-momentum transfer to the lepton system, q 2 = (p µ B − p µ D ) 2 , where p µ B , p µ D and p µ µ are the four-momenta of the B meson, the D * + meson and the muon. The determination of the rest-frame variables requires knowledge of the B candidate momentum vector in the laboratory frame, which is estimated from the measured parameters of the reconstructed final-state particles.
The B momentum direction is determined from the unit vector to the B decay vertex from the associated PV. The component of the B momentum along the beam axis is approximated using the relation (p B ) z = m B mreco (p reco ) z , where m B is the known B mass, and m reco and p reco are the mass and momentum of the system of reconstructed particles. The rest-frame variables described above are then calculated using the resulting estimated B four-momentum and the measured four-momenta of the µ − and D * + . The rest-frame variables are shown in simulation studies to have sufficient resolution (≈ 15%-20% full width at half maximum) to preserve the discriminating features of the original distributions.
Simulated events are used to derive kinematic distributions from signal and B backgrounds that are used to fit the data. The hadronic transition-matrix elements for B 0 → D * + τ − ν τ and B 0 → D * + µ − ν µ decays are described using form factors derived from heavy quark effective theory [26]. Recent world averages for the corresponding parameters are taken from Ref. [27]. These values, along with their correlations and uncertainties, are included as external constraints on the respective fit parameters. The hadronic matrix elements describing B 0 → D * + τ − ν τ decays include a helicity-supressed component, which is negligible in B 0 → D * + µ − ν µ decays [28]. This parameter is not well-constrained by data; hence, the central value and uncertainty from the sum rule presented in Ref. [8] are used as a constraint. It is assumed that the kinematic properties of the B 0 → D * + τ − ν τ decay are not modified by any SM extensions.
For the background semileptonic decays B → (D 1 (2420), D * 2 (2460), D 1 (2430))µ − ν µ (collectively referred to as B → D * * (→ D * + π)µ − ν µ ), form factors are taken from Ref. [29]. The slope of the Isgur-Wise function [30,31] is included as a free parameter in the fit, with a constraint derived from fitting the D * + µ − π − control sample. This fit also serves to validate this choice of model for this background. Contributions from B 0 s → (D + s1 (2536), D * + s2 (2573))µ − ν µ decays use a similar parameterization, keeping only the lowestorder terms. Semileptonic decays to heavier charmed hadrons decaying as D * * → D * + ππ and semitauonic decays B → (D 1 (2420), D * 2 (2460), D 1 (2430))τ − ν τ are modeled using the ISGW2 [32] parameterization. To improve the modeling for the former, a fit is performed to the D * + µ − π + π − control sample to generate an empirical correction to the q 2 distribution, as the resonances that contribute to this final state and their respective form factors are not known. The contribution of semimuonic decays to excited charm states amounts to approximately 12% of the normalization mode in the fit to the signal sample.
An important background source is B decays into final states containing two charmed hadrons, B → D * + H c X, followed by semileptonic decay of the charmed hadron H c → µν µ X. This process occurs at a total rate of 6%-8% relative to the normalization mode. The template for this process is generated using a simulated event sample of B + and B 0 decays, with an appropriate admixture of final states. Corrections to the simulated template are obtained by fitting the D * + µ − K ± control sample. A similar simulated sample is also used to generate kinematic distributions for final states containing a tertiary muon from B → D * + D − s X decays, with D − s → τ − ν τ and τ − → µ − ν µ ν τ . The kinematic distributions of hadrons misidentified as muons are derived based on the sample of D * + h ± candidates. Control samples of D * + (Λ) decays are used to determine the probabilities for a π or K (p) to be misidentified as a muon, and to generate a 3 × 3 matrix of probabilities for each species to satisfy the criteria for identification as a π, K or p. These are used to determine the composition of the D * + h ± sample in order to model the background from hadrons misidentified as muons. Two methods are developed to handle the unfolding of the individual contributions of π, K, and p, which result in different values for R(D * ). The average of the two methods is taken as the nominal central value, and half the difference is assigned as a systematic uncertainty.
Combinatorial backgrounds are classified based on whether or not a genuine D * + → D 0 π + decay is present. Wrong-sign D 0 π − µ − combinations are used to determine the component with misreconstructed or false D * + candidates. The size of this contribution is constrained by fitting the ∆m distribution of D * + µ − candidates in the full ∆m region. The contribution from correctly reconstructed D * + candidates combined with µ − from unrelated b hadron decays is determined from wrong-sign D * + µ + combinations. The size of this contribution is constrained by use of the mass region 5280 < m(D * + µ ∓ ) < 10000 MeV/c 2 , which determines the expected ratio of D * + µ − to D * + µ + yields. In both cases, the contributions of misidentified muons are subtracted when generating the kinematic distributions for the fit.
The binned m 2 miss , E * µ , and q 2 distributions in data are fit using a maximum likelihood method with three dimensional templates representing the signal, the normalization and the background sources. To avoid bias, the procedure is developed and finalized without knowledge of the resulting value of R(D * ). The templates extend over the kinematic region −2 < m 2 miss < 10 GeV 2 /c 4 in 40 bins, 100 < E * µ < 2500 MeV in 30 bins, and −0.4 < q 2 < 12.6 GeV 2 /c 4 in 4 bins. The fit extracts: the relative contributions of signal and normalization modes and their form factors; the relative yields of each of the B → D * * (→ D * + π)µν and their form factors; the relative yields of B 0 s → D * * + s (→ D * + K 0 S )µ − ν µ and B → D * * (→ D * + ππ)µ − ν decays; the relative yield of B → D * + H c (→ µνX )X decays; the yield of misreconstructed D * + and combinatorial backgrounds; and the background yield from hadrons misidentified as muons separately above and below |p µ | = 10 GeV. Uncertainties in the shapes of the templates due to the finite number of simulated events, which are therefore uncorrelated bin-to-bin, are incorporated directly into the likelihood using the Beeston-Barlow 'lite' procedure [33]. The fit includes shape uncertainties with bin-to-bin correlations (e.g. form factor uncertainties) via interpolation between nominal and alternative histograms. Control samples for partially reconstructed backgrounds (i.e. D * + µ − π − , D * µ − π + π − , and D * µ − K ± ) are fit independently from the fit to the signal sample. Since the selections used for these control samples include inverting the isolation requirement used to select the signal sample, this method allows for the determination of the corrections to the B → D * + H c (→ µνX )X and B → D * + ππµ − ν µ backgrounds with negligible influence from the signal and normalization events. The results are validated with an independently-developed alternative fit. In this second approach, control samples are fit simultaneously with the signal sample with correction parameters allowed to vary, allowing correlations among parameters to be incorporated exactly. This fit also forgoes the use of interpolation in favor of reweighting the simulated samples and recomputing the kinematic distributions for each value of the corresponding parameters. The two fits are extensively cross-checked and give consistent results.  The results of the fit to the signal sample are shown in Fig. 1. Values of the B 0 → D * + µ − ν µ form factor parameters determined by the fit agree with the current world average values. The fit finds 363 000 ± 1600 B 0 → D * + µ − ν µ decays in the signal sample and an uncorrected ratio of yields N (B 0 → D * + τ − ν τ )/N (B 0 → D * + µ − ν µ ) = (4.54 ± 0.46)×10 −2 . Accounting for the τ − → µ − ν µ ν τ branching fraction [25] and the ratio of efficiencies results in R(D * ) = 0.336 ± 0.034, where the uncertainty includes the statistical uncertainty, the uncertainty due to form factors, and the statistical uncertainty in the kinematic distributions used in the fit. As the signal yield is large, this uncertainty is dominated by the determination of various background yields in the fit and their correlations with the signal, which are as large as −0.68 in the case of B → D * + H c (→ µνX )X.
Systematic uncertainties on R(D * ) are summarized in Table 1. The uncertainty in extracting R(D * ) from the fit (model uncertainty) is dominated by the statistical uncertainty of the simulated samples; this contribution is estimated via the reduction in the fit uncertainty when the sample statistical uncertainty is not considered in the likelihood. The systematic uncertainty from the kinematic shapes of the background from hadrons misidentified as muons is taken to be half the difference in R(D * ) using the two unfolding methods. Form factor parameters are included in the likelihood as nuisance parameters, and represent a source of systematic uncertainty. The total uncertainty on R(D * ) estimated  from the fit therefore incorporates these sources. To separate the statistical uncertainty and the contribution of the form factor uncertainty, the fit is repeated with form factor parameters fixed to their best-fit values, and the reduction in uncertainty is used to determine the contribution from the form factor uncertainties. The systematic uncertainty from empirical corrections to the kinematic distributions of B → D * * (→ D * + ππ)µ − ν µ and B → D * + H c (→ µνX )X backgrounds is also computed based on fixing the relevant parameters to their best fit values, as described above. The contribution of B → D * * (→ D * + π)τ − ν τ , B → D * * (→ D * + ππ)τ − ν τ and B 0 s → (D + s1 (2536), D + s2 (2573))τ − ν τ events is fixed to 12% of the corresponding semimuonic modes, with half of this yield assigned as a systematic uncertainty on R(D * ). Similarly the contribution of B → D * + D − s (→ τ − ν τ ) decays is fixed using known branching fractions [25], and 30% changes in the nominal value are taken as a systematic uncertainty. Corrections to the modeling of variables related to the pointing of the D 0 candidates to the PV are needed to derive the kinematic distributions for the fit. These corrections are derived from a comparison of simulated B 0 → D * + µ − ν µ events with a pure B 0 → D * + µ − ν µ data sample, and a systematic uncertainty is assigned by computing an alternative set of corrections using a different selection for this data subsample.
The expected yield of D * + µ − candidates compared to D * + µ + candidates (used to model the combinatorial background) varies as a function of m(D * + µ ∓ ). The size of this effect is estimated in the 5280 < m(D * + µ ∓ ) < 10000 MeV/c 2 region and the uncertainty is propagated as a systematic uncertainty on R(D * ).
Uncertainties in converting the fitted ratio of signal and normalization yields into R(D * ) (normalization uncertainties) come from the finite statistical precision of the simulated samples used to determine the efficiency ratio, and several other sources. The efficiency of the hardware triggers obtained in simulation differs between magnet polarities and between Pythia versions -the midpoint of the predictions is taken as the nominal value and the range of variation is taken as a systematic uncertainty on the efficiency ratio. Particle identification efficiencies are applied to simulation based on binned J/ψ → µ + µ − and D 0 → K − π + control samples, which introduces a systematic uncertainty that is estimated by binning the control samples differently and by comparing to simulated particle identification. The signal and normalization form factors alter the expected ratio of detector acceptances, and 1σ variations in these with respect to the world averages are used to to assign a systematic uncertainty. Finally, the uncertainty in the current world average value of B(τ − → µ − ν µ ν τ ) contributes a small normalization uncertainty.
In conclusion, the ratio of branching fractions R(D * ) = B(B 0 → D * + τ − ν τ )/B(B 0 → D * + µ − ν µ ) is measured to be 0.336 ± 0.027 (stat) ± 0.030 (syst). The measured value is in good agreement with previous measurements at BaBar and Belle [3,5] and is 2.1 standard deviations greater than the SM expectation of 0.252 ± 0.003 [8]. This is the first measurement of any decay of a b hadron into a final state with tau leptons at a hadron collider, and the techniques demonstrated in this letter open the possibility to study a broad range of similar b hadron decay modes with multiple missing particles in hadron collisions in the future.
We express our gratitude to our colleagues in the CERN accelerator departments for the excellent performance of the LHC. We thank the technical and administrative staff at the LHCb institutes. We acknowledge support from CERN and from the national agencies: