Search for lepton-flavor violating decays of the Higgs boson in the $\mu\tau$ and e$\tau$ final states in proton-proton collisions at $\sqrt{s}$ = 13 TeV

A search is presented for lepton-flavor violating decays of the Higgs boson to $\mu\tau$ and e$\tau$. The data set corresponds to an integrated luminosity of 137 fb$^{-1}$ collected at the LHC in proton-proton collisions at a center-of-mass energy of 13 TeV. No significant excess has been found, and the results are interpreted in terms of upper limits on lepton-flavor violating branching fractions of the Higgs boson. The observed (expected) upper limits on the branching fractions are, respectively, $\mathcal{B}($H $\to\mu\tau)$ $\lt$ 0.15 (0.15)% and $\mathcal{B}($H$\to$e$\tau)$ $\lt$ 0.22 (0.16)% at 95% confidence level.


Introduction
One of the main goals of the LHC program is to search for processes beyond the standard model (BSM).The properties and decays of the Higgs boson (H) are thus far consistent with expectations of the standard model (SM) [1][2][3][4][5][6].However, there is considerable motivation to search for BSM decays of the Higgs boson.The lepton-flavor violating (LFV) decays of the Higgs boson [7][8][9] can provide possible signatures of such processes.A previous investigation of the combined results from the CMS experiment constrained the branching fraction for B(H → BSM) to <0.36 at the 95% confidence level (CL), leaving the possibility for a large contribution for these decays [10].
Here we report a search for LFV decays of the Higgs boson in the µτ and eτ channels performed using data collected by the CMS experiment in proton-proton (pp) collisions at a centerof-mass energy of 13 TeV during the 2016-2018 data-taking period, corresponding to an integrated luminosity of 137 fb −1 .The CMS experiment set upper limits of 0.25% and 0.61% [30] and the ATLAS experiment set upper limits of 0.28% and 0.47% [31] on B(H → µτ) and B(H → eτ) at 95% CL, respectively, based on the 2016 data set, corresponding to an integrated luminosity of 36 fb −1 .
Our search is performed in the µτ h , µτ e , eτ h , and eτ µ channels, where τ h , τ e , and τ µ correspond to the τ → hadrons, electron, and muon decay channels of τ leptons, respectively, each accompanied by its corresponding neutrinos.The eτ e and µτ µ decays are not considered because of the large background contribution from Z/γ * decays.
Our search significantly improves the sensitivity relative to similar previous studies [30,31,37].The search makes use of boosted decision tree (BDT) discriminants to distinguish signal from background in the distributions which are then used for performing the statistical analysis.Constraints on the branching fractions are extracted under the assumption that only one of the LFV decays contributes additionally to the SM Higgs boson total width.The constraints on the branching fractions are correspondingly translated into limits on the Y eτ and Y µτ LFV Yukawa couplings.
This paper is organized as follows: a description of the CMS detector is given in Section 2, collision data and simulated events are discussed in Section 3, event reconstruction is described in Section 4, and event selection is described separately for the four decay channels in Section 5. Background estimation and systematic uncertainties are described in Sections 6 and 7, respectively.Results are presented in Section 8 and the paper is summarized in Section 9.

The CMS detector
The CMS detector consists of a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), a brass and scintillator hadron calorimeter (HCAL), and a muon system composed of gaseous detectors.Each subdetector consists of a barrel and two endcap sections.The central feature of the CMS detector is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. The tracking systems and the calorimeters are contained within the solenoid volume; the muon chambers are embedded in the steel fluxreturn yoke outside the solenoid.Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors.
Events of interest are selected using a two-tiered trigger system.The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of ≈100 kHz within a fixed latency of ≈4 µs [38].The second level, the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing that reduces the event rate to ≈1 kHz before data storage [39].A more detailed description of the CMS detector, together with a definition of the coordinate system and kinematic variables, can be found in Ref. [40].

Collision data and simulated events
The search presented makes use of pp collisions collected at the CMS experiment at a centerof-mass energy of 13 TeV in 2016-2018.The total integrated luminosity amounted to 35.9 fb −1 in 2016, 41.5 fb −1 in 2017, and 59.7 fb −1 in 2018.Single-muon triggers with isolation criteria are used to collect the data in the µτ h channel.Electron-muon triggers are used to collect data in the µτ e and eτ µ channels.Triggers requiring a single isolated electron, or a combination of an electron and τ h , are used in the eτ h channel.The trigger thresholds are mentioned in Section 5.
Simulated events are used to model signal and background events using several event generators.In all cases parton showering, hadronization, and underlying event properties are modeled using PYTHIA [41] version 8.212.The PYTHIA parameters affecting the description of the underlying event are set to the CUETP8M1 tune in 2016 [42], except for the tt events that use the CP5 tune which is used for all the events in 2017 and 2018 [43].The NNPDF3.0 parton distribution functions (PDFs) for all 2016 events and the NNPDF3.1 PDFs for the 2017 and 2018 events [44].
The simulation of interactions in the CMS detector is based on GEANT4 [45], using the same reconstruction algorithms as used for data.The Higgs bosons are generated in pp collisions predominantly through gluon fusion (ggH) [46], but also via vector boson fusion (VBF) [47], and in association with a vector boson (W or Z) [48].Such events are generated at next-toleading order (NLO) in perturbative quantum chromodynamics (QCD) with the POWHEG v2.0 generator [49][50][51][52][53][54], using the implementation of Refs.[55,56].For the LFV signal, we consider just the Higgs bosons via the ggH and VBF mechanisms as the contribution from associated vector boson production is found to be negligible.
The Z → ττ background events are estimated in a data-driven manner using the embedding technique because it provides a better description of jets, pileup, as well as detector noise and resolution effects compared to simulation.These events are obtained from data with well identified Z → µµ decays from which muons are removed, and simulated τ leptons are embedded with the same kinematic variables as the replaced muons.The MADGRAPH5 aMC@NLO generator [57] (version 2.2.2 in 2016, version 2.4.2 in 2017 and 2018) is used to simulate the Z → ee+jets and Z → µµ+jets processes, the W+jets background process, and the electroweak (EW) W/Z events.They are simulated at leading order with the MLM jet matching and merging schemes [58].
Diboson production is simulated at NLO using the MADGRAPH5 aMC@NLO generator with the FxFx jet-matching and merging scheme [59].Top quark-antiquark pair and single top quark production are generated at NLO using POWHEG.
The effect of pileup, where events of interest have multiple pp interactions in the same bunch crossing, is taken into account in simulated events by generating concurrent minimum bias events.All simulated events are weighted to match the pileup distribution observed in the data.

Event reconstruction
The particle flow (PF) algorithm [60] reconstructs and identifies each particle in an event through an optimized combination of information from the various subdetectors of the CMS detector.In this process, identifying the PF candidate type (photons, electrons, muons, charged, and neutral hadrons) plays an important role in determining particle direction and energy.The candidate vertex with the largest value of summed physics object p 2 T , where p T is the transverse momentum, is taken to be the primary pp interaction vertex (PV).The physics objects are returned by a jet finding algorithm [61,62] applied to all charged tracks associated with the vertex, plus the corresponding associated missing transverse momentum ( p miss T ).
An electron is identified as a track from the PV combined with one or more ECAL energy clusters.These clusters correspond to the electron and possible bremsstrahlung photons emitted when passing through the tracker.Electrons are accepted in the range |η| < 2.5, except for the region 1.44 < |η| < 1.57where the detector's service infrastructure is located.They are identified with an efficiency of 80% using a multivariate discriminator that combines observables sensitive to the amount of bremsstrahlung energy deposited along the electron trajectory, the geometric and momentum matching between the electron trajectory and associated clusters, and the distribution in shower energy in the calorimeters [63].Electrons from photon conversions are removed.The electron momentum is estimated by combining the energy measurement in the ECAL with the momentum measurement in the tracker.The momentum resolution for electrons with p T ≈ 45 GeV from Z → ee decays ranges from 1.7 to 4.5% depending on the |η|.It is generally better in the barrel region than in the endcaps [64].
Muons are measured in the |η| < 2.4 range using the drift tube, cathode strip chamber, and resistive plate chamber technologies.The efficiency to reconstruct and identify muons is greater than 96%.Matching muons to tracks measured in the silicon tracker results in a relative p T resolution for muons with p T up to 100 GeV of 1% in the barrel and 3% in the endcaps [65].
The muon or electron isolations are measured relative to its p T , where is either µ or e, values by summing over the scalar p T of PF particles in a cone of ∆R = 0.4 or 0.3 around the lepton: where p

PV charged T
, p neutral T , and p γ T indicate the p T of a charged hadron, a neutral hadron, and a photon within the cone, respectively.The neutral particle contribution to isolation from pileup, p PU T ( ), is estimated from the area of jet and its median energy density in the event [66] for the electron.For the muon, half of the p T sum of charged hadrons within the isolation cone, not originating from the PV, is used instead.The charged-particle contribution to isolation from the pileup is rejected by requiring the tracks to originate from the PV.
The reconstruction of τ h is performed using the hadrons-plus-strips algorithm, which combines the signature for charged hadrons composed of tracks left in the tracker and energy depositions in the calorimeters with the signature for electrons or photons from neutral pion decays that are reconstructed as electromagnetic "strips" in η-φ space [67], where φ is the azimuth in radians.The combination of these signatures provides the four-vector for the parent τ h .Based on the overall neutral versus charged contents of the τ h reconstruction, a decay mode is assigned as h ± , h ± π 0 , h ± h ∓ h ± , or h ± h ∓ h ± π 0 , where h ± denotes a charged hadron.It has a reconstruction efficiency of ≈80%.
The τ h reconstructed using the hadrons-plus-strips algorithm must be well identified to reject jets, muons, and electrons misidentified as τ h .A deep neural network (DNN) discriminator is used to further improve τ h identification [68].The input variables to the DNN include τ h lifetime, isolation, and information of PF candidates reconstructed within the τ lepton signal or isolation cones.A p T dependent threshold on the output of the DNN is used to distinguish τ h from jets.The chosen working point (WP) has a τ h identification efficiency of 70% with a misidentification probability of 1%.The DNN can reject electrons and muons misidentified as τ h using dedicated criteria based on the consistency between the tracker, calorimeter, and muon detector measurements.In the µτ h or eτ h channel, we use a WP that has an efficiency of 97.5% or 87.5% with a misidentification probability of 1-2% or 0.2-0.3% to discriminate τ h against electrons, and we use a WP that has an efficiency of 99.6% or 99.8% with a misidentification probability of 0.04% or 0.06% to discriminate τ h against muons, respectively.Charged hadrons are defined as PF tracks from the PV not reconstructed as electrons, muons, or τ h leptons.Neutral hadrons are identified as HCAL energy clusters not assigned to any charged hadron or as excesses in ECAL or HCAL energies relative to the small charged-hadron energy deposit.All the PF hadron candidates are clustered into jets using the infrared-and collinear-safe anti-k T algorithm [61] with a distance parameter of 0.4.Jet momentum is determined as the vectorial sum of all particle momenta in the jet.It is found from simulation to be, on average, within 5-10% of the true momentum over the entire p T spectrum and detector acceptance [69].Jets that contain b quarks are tagged using a DNN-based algorithm, using a WP with efficiency of 70% for a misidentification probability for light-flavor jets of 1% [70].
The interactions from pileup add more tracks and calorimetric energy depositions, thereby increasing the apparent jet momenta.To mitigate this effect, tracks identified as originating from pileup vertices are discarded, and an offset correction is applied to correct the remaining contributions [71].Jet energy corrections are obtained from simulation studies so that the average measured energy of jets matches that of particle level jets.In-situ measurements of the momentum balance in photon+jet, Z+jets, and multijet events are used to determine any residual differences between the jet energy scale in data and simulation, and appropriate corrections are applied [72].Additional selection criteria are applied to each jet to remove jets potentially dominated by instrumental effects or reconstruction failures.When combining information from the entire detector, the jet energy resolution typically amounts to 15% at 10 GeV, 8% at 100 GeV, and 4% at 1 TeV.The variable ∆R = √ (∆η) 2 + (∆φ) 2 is used to measure the separation between reconstructed objects in the detector.Any jet within ∆R = 0.5 of identified leptons is removed.The reconstructed jets must have a p T > 30 GeV and |η| < 4.7.Data collected in the high |η| region of the ECAL endcaps were affected by noise during the 2017 data taking.This is mitigated by discarding events containing jets with p T < 50 GeV and 2.65 < |η| < 3.14 in the 2017 data.
The vector p miss T is computed as the negative of the vector p T sum of all the PF candidates in an event, and its magnitude is denoted as p miss T [73].The p miss T is modified to account for corrections to the reconstructed jets' energy scale in the event.Anomalous high-p miss T events can originate from various reconstruction failures, detector malfunctions, or backgrounds not from beam-beam sources.Such events are rejected using event filters designed to identify more than 85-90% of the spurious high-p miss T events with a mistag rate of less than 0.1% [73].In addition to the event-filtering algorithms, we require the jets to have a neutral hadron energy fraction smaller than 0.9, which rejects more than 99% of jets due to detector noise, independent of jet p T , with a negligible mistag rate.Corrections applied to the p miss T reduce the mismodeling of p miss T in simulated Z, W, and Higgs boson events.The corrections are applied to simulated events based on the vectorial difference in the measured p miss T and total p T of neutrinos originating from the decay of the Z, W, or Higgs bosons.Their average effect is the reduction of the magnitude of the p miss T obtained from the simulation by a few GeV.

Event selection
The signal topology consists of a muon or an electron and an oppositely charged τ lepton.The events in the µτ and eτ channels are further divided into leptonic and hadronic channels based on the τ lepton decay mode (τ µ , τ e , or τ h ).Jets misidentified as electrons or muons are suppressed by imposing isolation requirements described above.A set of loose selection criteria, known as the 'preselection', is first defined in each channel's respective signature.Events with more than two jets are not considered in the search.Each channel's events are then divided into categories based on the number of jets in the event (0-, 1-, or 2-jet) to enhance different Higgs boson production mechanisms.The dominant production mechanism contributing to the signal yield in the 0-jet category is ggH, while in the 1-jet category, it is ggH with initial-state radiation.The 2-jet category is further split into two based on the invariant mass of the two jets (m jj ).The optimization resulted in a threshold of 550 GeV and 500 GeV on m jj for the µτ and eτ channels, respectively, for the sensitivity optimization.The dominant production mechanism is ggH with initial-state radiation for events with m jj < 550 GeV and < 500 GeV, while it is VBF for events with m jj > 550 GeV and > 500 GeV for the µτ and eτ channels, respectively.
A variable providing an estimate of m H using the observed decay products of the Higgs boson, the collinear mass, is defined as m col = m vis /

√
x vis τ , where m vis is the visible invariant mass of the τ-µ or τ-e system and x vis τ is the fraction of the τ lepton p T carried by the visible decay products of the τ lepton ( τ vis ).The definition is based on the "collinear approximation" with the observation that, since m H m τ , the τ lepton decay products are Lorentz-boosted in the direction of the τ lepton [74].The momentum of neutrino(s) from the τ lepton decay can be approximated to have the same direction as the τ vis .The component of the p miss T in the direction of the τ vis is used to estimate the transverse component of the neutrino momentum (p ν,est T ).This information is combined to estimate the x vis τ which is defined as The collinear mass distributions of simulated signal, data, and backgrounds in each channel are shown in Fig. 1.
The transverse mass m T ( ) is a variable constructed from the lepton p T and the p miss T vectors: ), where ∆φ , p miss T is the angle in the transverse plane between the lepton and the p miss T , used to discriminate the Higgs boson signal from the W+jets background.The m T ( ) distribution for the signal defined using visible decay products of the τ lepton peaks at lower values, while it peaks at higher values for the W+jets background.( To improve discrimination between signal and background events, a BDT is trained using the TMVA toolkit of the ROOT analysis package [75].A BDT is trained in each channel using a mixture of simulated signal events comprising the ggH and VBF processes, weighted according to their expected yield from SM production cross sections.In hadronic channels, the dominant sources of background come from the Z → ττ process and events with misidentified leptons.The background used for training a BDT in the hadronic channels is obtained from data containing misidentified lepton events of the same electric charge for both the leptons and Z → ( = e, µ, τ) simulated events with their applied signal selections.In leptonic channels, the dominant sources of background come from the Z → ττ process, the tt process, and events with misidentified leptons.The background used for training a BDT in the leptonic channels is obtained from tt and Z → simulated events mixed and weighted according to their expected yield from SM production cross sections.Additional background for training comes from events with misidentified leptons in a control region (CR) in data, where the isolation requirements are inverted with the same electric charge for both the leptons.A detailed description of the different background processes and their estimation is given in Section 6.

Events/Bin
The input variables to the BDT are mentioned separately for each channel below.The input variables are chosen based on their separation power as observed during training the BDT.The trained BDT is validated in a dedicated background enriched validation region (VR) for each channel and is detailed in Section 6.In all the channels, events containing additional electrons, muons, or τ h candidates are vetoed.Also, events with at least one b-tagged jet are rejected to suppress the tt background.After applying the selections, a maximum likelihood fit is performed to the BDT discriminant distributions in each channel.The various systematic uncertainties are incorporated as nuisance parameters in the fit.The BDT discriminant distributions in all the channels are shown after determining the best fit values of the nuisance parameters from the fit to the signal-plus-background hypothesis, as discussed later in Section 7.

H → µτ h
In this channel, the preselection requires a muon and τ h of opposite electric charge with a separation of ∆R > 0.5.The trigger requires the presence of an isolated muon with a p T threshold of 24 GeV.In 2017, this trigger is "prescaled", which means that only a fraction of events selected will pass the trigger.Hence, it is used in conjunction with another trigger based on the presence of an isolated muon with a p T threshold of 27 GeV.The muon is required to have p T > 26 GeV, |η| < 2.1, and I µ rel < 0.15.The τ h is required to have p T > 30 GeV and |η| < 2.3.The selections for the µτ h channel are summarized in Table 1.
The input variables to the BDT are ).The neutrino is assumed to be collinear with τ h , which motivates using the ∆φ(τ h , p miss T ) variable.The two leptons are usually produced in opposite directions of the azimuthal plane, which motivates using the ∆φ(µ, τ h ) variable.The post-fit distributions of simulated signal, data, and backgrounds in each category of the µτ h channel are shown in Fig. 2.

H → µτ e
In this channel, the preselection requires a muon and electron of opposite electric charge with a separation of ∆R > 0.3.The triggers require both a muon and an electron, where the muon has p T above 23  ).The neutrinos are assumed to be collinear with the electron, which motivates using the ∆φ(e, p miss T ) variable.The two leptons are usually produced in opposite directions of the azimuthal plane, which motivates using the ∆φ(e, µ) variable.The post-fit distributions of simulated signal, data, and backgrounds in each category of the µτ e channel are shown in Fig. 3.

H → eτ h
In this channel, the preselection requires an electron and τ h of opposite electric charge with a separation of ∆R > 0.5.The triggers require the presence of an isolated electron with a p T threshold of 25 GeV (2016), 27 GeV (2017), or 32 GeV (2018).In 2017 and 2018, the signal acceptance is increased by selecting events where the electron has p T above 24 GeV and the τ h has p T above 30  ).As can be seen, the input variables are similar to µτ h channel except for the addition of the variable m vis and removing p miss T .The variable m vis has better separation power as the eτ h channel has more Z → ee+jets background than the Z → µµ+jets background in the µτ h channel.The post-fit distributions of simulated signal, data, and backgrounds in each category of the eτ h channel are shown in Fig. 4.

H → eτ µ
In this channel, the preselection requires an electron and muon of opposite electric charge with a separation of ∆R > 0.4.The triggers require both an electron and a muon, where the electron has p T above 23 GeV, and the muon has p T above 8 GeV.The electron is required to have p T > 24 GeV, |η| < 2.5, and I ).As can be seen, the input variables are similar to µτ e channel except for the Obs./Exp.(  (  (  (  (  ( ).The post-fit distributions of simulated signal, data, and backgrounds in each category of the eτ µ channel are shown in Fig. 5.

Background estimation
One of the major background contributions comes from the Z → ττ process, in which the muon or electron arises from a τ lepton decay.The other major background contributions arise from the W+jets process and from multijets events produced through the strong interaction (referred to as QCD multijet events hereafter), where one or more of the jets are misidentified as leptons.These backgrounds are estimated from data either fully or with the aid of simulation.The tt and single top quark background contributes substantially in leptonic channels and is estimated using simulated events along with the other backgrounds.The background estimates are validated in different orthogonal VRs constructed to have enhanced contributions from specific backgrounds.

Z → ττ background
The Z → ττ background is estimated from data using an embedding technique [76].This technique allows for an estimation of the genuine ττ SM backgrounds from data with reduced simulation input.This minimizes the uncertainties that arise from using simulation.Events with a pair of oppositely charged muons are selected in data so that Z → µµ events largely dominate.These data events are selected independently of the event selection criteria described in Section 5.The muons are removed from the selected events and replaced with simulated τ leptons with the same kinematic properties as those of the replaced muon.In that way, a set of hybrid events is obtained that relies on simulation only for the decay of the τ leptons.The description of the underlying event or the production of associated jets is taken entirely from data.This technique results in a more accurate description of the p miss T and jet-related variables than simulation and an overall reduction in the systematic uncertainties.Embedded events cover all backgrounds with two genuine τ leptons, and this includes a small fraction of Obs./Exp.(  (  (  (  (  (  (  ( tt, diboson, and EW W/Z events.The simulated events from the tt, diboson, and EW W/Z where both τ candidates match to τ leptons at the generator level are removed to avoid any double counting.

Misidentified lepton background
The misidentified lepton background corresponds to events where jets are misidentified as leptons.They mostly arise from two sources: W+jets and QCD multijet events.In W+jets background events, one of the leptons is from the W boson decay while the other is a jet misidentified as a lepton.In QCD multijet events, both the leptons are misidentified jets.In the µτ h and eτ h channels, the contributions from misidentified lepton backgrounds have been estimated using a "misidentification rate" approach.In the µτ e and eτ µ channels, an "extrapolation factor" approach is adopted, which is consistent with the "misidentification rate" approach, and is used because of limited statistical precision in the leptonic channels.

Misidentification rate approach
The misidentified lepton background in the signal region (SR) is estimated using misidentification rates from Z+jets CR and applied to a background-enriched region from collision data.The misidentification rates are evaluated using events with a Z boson and at least one jet that can be misidentified as a lepton.The probabilities with which jets are misidentified as an electron, muon, or τ h are labeled as f e , f µ , and f τ h , respectively.The Z boson is formed using two muons with p T > 26 GeV, |η| < 2.4, and I µ rel < 0.15 for measuring the jet → τ h , µ, e misidentification rate.The muons are required to be oppositely charged and have invariant mass between 70 and 110 GeV.The contribution from diboson events, where there is a genuine lepton, is subtracted using simulation.
The jet is required to pass the same lepton identification criteria as used in the SR.A "signallike" and "background-like" regions are defined.The isolation for the electron and muon is required to have I rel < 0.15 and the τ h discriminated against jets at a WP that has an identification efficiency of about 70% for the "signal-like" region.For the "background-like" region, lepton isolation is required to be 0.15 < I µ rel < 0.25 and 0.15 < I e rel < 0.50, and the τ h is discriminated against jets at a WP that has an identification efficiency of about 80% and not pass the WP that has an identification efficiency of about 70%.After the "signal-like" and "backgroundlike" regions are defined, the misidentification rates are computed as functions of the lepton p T .The misidentification rates f e , f µ , and f τ h are estimated as: where S i is the number of events in the "signal-like" region, while B i is the number of events in the "background-like" region.The τ h misidentification rate shows a p T dependence that depends on the τ h decay mode and |η| and is therefore evaluated as a function of p τ T for the different decay modes and two η regions (|η| < 1.5 or |η| > 1.5).
In the eτ h channel, the τ h misidentification rate is evaluated using events with a Z boson formed using two electrons with p T > 27 GeV, |η| < 2.5, and I e rel < 0.15.The electrons must be oppositely charged and have an invariant mass between 70 and 110 GeV.The reason for using Z → ee events for evaluating the τ h misidentification rate in eτ h channel is that the DNN WPs used for discriminating τ h against electrons and muons are different in this channel compared to the µτ h channel as described in Section 4. The misidentification rates evaluated using this CR are compatible with the misidentification rates measured in Z → µµ events.The computed misidentification rates f i depend on the lepton p T for electrons and muons or p T , η, and decay mode for the τ h candidates.The misidentification rates for electrons and muons are ≈0.4 and ≈0.6, respectively, at p T = 30.0GeV.The misidentification rates for τ h candidates are in the range 0.02-0.24at p T = 30.0GeV.They are used to estimate the background yields and obtain the distributions of the misidentified lepton background.This is accomplished through the following procedure.Each event in the "background-like" region, defined using the collision data with the same selection as the SR, but loosening the isolation requirements on one of the leptons, is weighted by a factor f i /(1 − f i ).Events with the possibility of double counting because of two misidentified leptons are subtracted.For example, events with both a misidentified muon or electron and a misidentified τ h are accounted once in the "background-like" region for muon or electron with a weight f /(1 − f ) and another time in the "background-like" region for τ h with a weight f τ /(1 − f τ ) and are hence double counted.This is mitigated by subtracting their contribution once using a weight, The background estimate is validated in a VR by requiring the two leptons to have the same electric charge, enhancing the misidentified lepton background.Figure 6 (left) shows the comparison of data with background estimates in this VR for the µτ h channel.The background estimate is also validated in a W boson enriched VR, as shown in Fig. 6 (middle).This VR is obtained by applying the preselection, m T ( , p miss T ) > 60 GeV ( = e or µ), and m T (τ h , p miss T ) > 80 GeV.

Extrapolation factor approach
In the eτ µ and µτ e channels, the QCD multijet background is estimated from the data using events with an electron and a muon with the same electric charge [77].Contributions from other processes are estimated from simulation and subtracted from the data.Extrapolation factors from the CR requiring the two leptons to have the same electric charge to the SR are measured in data as a function of the jet multiplicity and the ∆R separation between the electron and muon.
The extrapolation factors are estimated using events with a muon failing the isolation requirement and an isolated electron.The contribution from bb events to the QCD multijet background gives rise to the ∆R dependence and is parameterized with a linear function.The extrapolation factors are higher for events with low ∆R separation between the electron and muon, decreasing as the ∆R separation increases.The extrapolation factors also depend on the electron and muon p T .This p T dependence comes from the leptons arising from the semileptonic c quark decay.These leptons tend to be softer in p T and less isolated, resulting in a reduction in the number of such events passing the p T and isolation requirements.
As the extrapolation factors are from CR where the muon fails the isolation requirement, an additional correction is applied to cover a potential mismodeling.This correction is calculated by measuring the extrapolation factors in two different CRs.The first CR has events where the muon is isolated, and the electron fails the isolation requirement.The second CR has events where both the electron and muon fail the isolation requirement.The ratio of the extrapolation factors from these two CRs is taken as the correction to account for the potential mismodeling induced by requiring the muon to fail the isolation requirement.

Other backgrounds
Other background contributions come from processes in which a lepton pair is produced from the weak decays of quarks and vector bosons.These include tt, WW, WZ, and ZZ events.
There are nonnegligible contributions from processes such as Wγ ( * ) +jets, single top quark production, and Z → ( = e, µ). Figure 6 (right) shows the comparison of data with background estimates in the tt VR for the µτ e channel.This VR is defined by requiring the presence of at least one b-tagged jet in the event in addition to the preselection.The SM Higgs boson production contribution mainly comes from H → ττ and H → WW decays.

Systematic uncertainties
Several sources of experimental and theoretical systematic uncertainties are taken into account in the statistical analysis.These uncertainties affect both the normalization and distribution of the different processes.The different systematic uncertainties are incorporated in the likelihood as nuisance parameters for which log-normal a priori distributions are assumed, and distribution variations are taken into account via continuous morphing [78].The maximum likelihood and profile likelihood with asymptotic approximation are then computed using the defined likelihood to obtain the best fit branching fraction and upper limits on the branching fraction for the LFV Higgs boson decays.As the search is categorized into different final states, partial and complete correlations between the uncertainties in different categories are taken into account and are summarized in Table 3.
The uncertainties to reconstruct a τ h and estimate its identification efficiency for different p T ranges are measured using a tag-and-probe method [79] and found to be in the range of 2-3%.The uncertainties for different ranges of p T are treated as uncorrelated.These uncertainties are also considered for the embedded ττ background, where they are treated as 50% correlated with the simulation uncertainties.For the embedded events, triggering on muons before being replaced by τ leptons leads to an uncertainty in the trigger efficiency of about 4%, which is treated as uncorrelated between the three years due to different triggering criteria.There are two effects that need to be considered for the embedded events.The embedded events have higher track reconstruction efficiency because of reconstruction in an empty detector environment.The energy deposits of the replaced muons can cause event migration for τ h decay modes with a π 0 .Data to simulation scale factors cover these effects with corresponding systematic uncertainties.
Uncertainties arising from an electron or a muon misidentified as τ h correspond to between 7-40% or 10-70%, respectively, for different bins of p T , η, and τ h decay modes.The uncertainty in the τ h energy scale is treated as uncorrelated for different decay modes and 50% correlated between embedded and simulated backgrounds and ranges from 0.7-1.2%.The uncertainty in the electron energy scale and the muon momentum scale for misidentified leptons is independent of the τ h energy scale and amounts to 7% and 1%, respectively.The effect of lepton energy resolution is found to be negligible.
The jet energy scale is affected by several sources, and its uncertainty is evaluated as a function of p T and η.The jet energy scale's effect is propagated to the BDT discriminant and varies from 3-20% [72].The uncertainties in jet energy resolution are also taken into account and mostly impact the m jj -defined categories.The jets with p T < 10 GeV fall under unclustered energy.The unclustered energy scale is considered independently for charged particles, neutral hadrons, photons, and very forward particles, that affect both the distributions and the total yields and are treated as uncorrelated.The efficiency to classify a jet as b-tagged is different in data and simulation, and scale factors that depend on jet p T are used to correct the simulation.The uncertainties in the measured values of these scale factors are taken as sources of systematic uncertainties.
The uncertainties in the reconstruction of electrons and muons, along with their isolation criteria, are measured using the tag-and-probe method in data in Z → ee and Z → µµ events and sum up to about 2% [64,80,81].The uncertainty in the measurement of the muon momentum scale is in the range 0.4-2.7% for different |η| ranges, while for the electron momentum scale, it is less than 1%.The selection of events using electron-and muon-based triggers results in an additional 2% uncertainty in the yield of simulated processes.In the eτ h channel, an additional 5% uncertainty is associated with using the trigger requiring the presence of both an electron and τ h in 2017 and 2018.The uncertainties related to the lepton identification and momentum scale are treated as correlated between the three years, while the uncertainties related to the triggering are treated as uncorrelated.
The misidentification rates in the eτ h and µτ h final states are parameterized using a linear function dependent on τ h p T , where two uncertainties are ascribed per fit function.The normalization uncertainties in the estimates of the misidentified lepton backgrounds (jet → τ h , µ, e) from data are taken from the VR, which is defined orthogonally to the SR.Additional uncertainty is estimated for the misidentified lepton background in the W boson enriched VR.It is parameterized as a function of ∆φ(µ, p miss T ) for the µτ h channel and as a function of ∆φ(e, p miss T ) for the eτ h channel.Discriminants with different signal-to-background ratios are used to differentiate τ h against electrons and muons, which entails an additional 3% uncertainty for the eτ h channel.
The misidentified lepton background in the eτ µ and µτ e final states is affected by different uncertainties.The statistical uncertainties arising from both fits of the extrapolation factors as a function of the lepton p T and the spatial separation between electron and muon are taken into account.The uncertainty in extrapolation factors resulting from inverting the muon isolation is taken into account.These uncertainties have a combined effect of about 20% on the normalization.The dominant source of uncertainty in the simulated background processes, Z → ee, Z → µµ, Z → ττ, WW, ZZ, Wγ, tt, and single top quark production is the measurement of the cross section for these processes and is treated as correlated between the three years.
The theoretical uncertainties affecting the measurement of Higgs boson production cross section are the QCD scales (renormalization and factorization scales), the choice of PDFs, and the strong coupling constant (α S ) evaluated at the Z boson mass.These uncertainties affect the signal's normalization and are treated as correlated between the three years [82].The changes made in QCD scales provide 3.9, 0.5, 0.9, and 0.8% uncertainties in the ggH, VBF, ZH, WH cross sections, respectively, while changes in the PDFs and α S result in 3.2, 2.1, 1.3, and 1.9% uncertainties, respectively.The acceptance is taken into account when changes are made in QCD scales and the PDFs and α S .
The normalization of the event yield for H → ττ is taken from simulation.The uncertainty in the B(H → ττ) includes a 1.70% due to missing higher-order corrections, a 0.99% in the quark masses, and a 0.62% on α S .The normalization of the event yield for H → WW is taken from simulation.The uncertainty in the B(H → WW) includes a 0.99% due to missing higher-order corrections, a 0.99% in the quark masses, and a 0.66% in α S .
The bin-by-bin uncertainties account for the statistical uncertainties in each bin of the distributions of every process.The Barlow-Beeston Lite [83] approach is used, assigning a single parameter to scale the sum of the process yields in each bin, constrained by the total uncertainty, instead of requiring separate parameters, one per process.This is useful to reduce the number of parameters required in the statistical analysis.They are treated as uncorrelated between bins, categories, and channels.
The integrated luminosities of the 2016, 2017, and 2018 data taking periods are individually known to have uncertainties in the 2.3-2.5% range [84][85][86], while the total integrated luminosity has an uncertainty of 1.8%, the improvement in precision reflecting the uncorrelated time evolution of some systematic effects.The uncertainty in the integrated luminosity affects all processes, with the normalization taken directly from the simulation.Uncertainty related to pileup is evaluated through changes made in the weights applied to the simulation and is treated as correlated between the three years.The dependence on weight is obtained through a 5% change in the total inelastic cross section used to estimate the number of pileup events in data.Other minimum-bias event modeling and initial-and final-state radiation uncertainties are estimated to be much smaller than those on the rate and are therefore neglected.
During the 2016 and 2017 data taking periods, a gradual shift in the timing of the inputs from the ECAL first-level trigger in the region of |η| > 2.0 caused a specific trigger inefficiency.For events containing an electron or a jet with respective p T > 50 GeV or > 100 GeV, in the region 2.5 < |η| < 3.0 the efficiency loss is 10-20%, depending on p T , η, and time.Correction factors are computed from data and applied to the acceptance evaluated through simulation.Uncertainty due to this correction factor is ≈1% and is treated as correlated between the two years.

Results
No significant excess has been found for the LFV Higgs boson decays in both channels, and upper limits have been placed.Upper limits on the branching fraction of Higgs boson decay are computed using the modified frequentist approach for CL s , taking the profile likelihood as a test statistic [87][88][89] in the asymptotic approximation.The observed (expected) upper limits on the Higgs boson branching fractions are 0.15 (0.15)% for H → µτ and 0.22 (0.16)% for H → eτ, respectively, at the 95% CL.The results have a dominant contribution from systematic uncertainties.The bin-by-bin uncertainties and the uncertainties related to the distribution of the misidentified lepton background have a significant impact followed by the lepton energy scale uncertainties.
The upper limits and the best fit branching fractions for B(H → µτ) and B(H → eτ) are  4 and 5.The limits are also summarized in Table 6 and graphically shown in Fig. 7.The limits are improved from previous results [30].The improvement relies on the larger data set, the updated background estimation techniques, and BDT classification.The results are cross-checked with an additional investigation following the strategy in Ref. [30] and are found to be consistent.
The upper limits on B(H → µτ) and B(H → eτ) are subsequently used to put constraints on LFV Yukawa couplings [11].The LFV decays eτ and µτ arise at tree level from the assumed flavor violating Yukawa interactions, Y α β , where α , β are the leptons of different flavors ( α = β ).The decay widths Γ(H → α β ) in terms of the Yukawa couplings are given by: and the branching fractions are given by: The SM Higgs boson decay width is assumed to be Γ SM = 4.1 MeV [90] for m H = 125 GeV.The 95% CL upper limit on the Yukawa couplings obtained from the expression for the branching fraction above is shown in Table 6.The limits on the Yukawa couplings are  In the left plot, the expected limit is covered by the observed limit as they have similar values.The flavor diagonal Yukawa couplings are approximated by their SM values.The green and yellow bands indicate the range that is expected to contain 68% and 95% of all observed limit variations from the expected limit.The shaded regions are constraints obtained from null searches for τ → 3µ or τ → 3e (dark blue) [92] and τ → µγ or τ → eγ (purple) [93].The blue diagonal line is the theoretical naturalness limit |Y ij Y ji | = m i m j /v 2 [11].

Summary
A search for lepton-flavor violation has been performed in the µτ and eτ final states of the Higgs boson in data collected by the CMS experiment.The data correspond to an integrated luminosity of 137 fb −1 of proton-proton collisions at a center-of-mass energy of 13 TeV.The results are extracted through a maximum likelihood fit to a boosted decision tree output, trained to distinguish the expected signal from backgrounds.The observed (expected) upper limits on the branching fraction of the Higgs boson to µτ are 0.15 (0.15)% and to eτ are 0.22 (0.16)%, respectively, at 95% confidence level.Upper limits on the off-diagonal µτ and eτ couplings are derived from these constraints,

Figure 1 :
Figure 1: Collinear mass distributions for the data and background processes.A B(H → µτ) = 20% and B(H → eτ) = 20% are assumed for the two signal processes.The channels are H → µτ h (upper row left), H → µτ e (upper row right), H → eτ h (lower row left), and H → eτ µ (lower row right).The lower panel in each plot shows the ratio of data and estimated background.The uncertainty band corresponds to the background uncertainty in which the statistical and systematic uncertainties are added in quadrature.

Figure 2 :
Figure 2: BDT discriminant distributions for the data and background processes in the H → µτ h channel.A B(H → µτ) = 20% is assumed for the signal.The channel categories are 0 jets (upper row left), 1 jet (upper row right), 2 jets ggH (lower row left), and 2 jets VBF (lower row right).The lower panel in each plot shows the ratio of data and estimated background.The uncertainty band corresponds to the background uncertainty in which the post-fit statistical and systematic uncertainties are added in quadrature.

Figure 3 :
Figure 3: BDT discriminant distributions for the data and background processes in the H → µτ e channel.A B(H → µτ) = 20% is assumed for the signal.The channel categories are 0 jets (upper row left), 1 jet (upper row right), 2 jets ggH (lower row left), and 2 jets VBF (lower row right).The lower panel in each plot shows the ratio of data and estimated background.The uncertainty band corresponds to the background uncertainty in which the post-fit statistical and systematic uncertainties are added in quadrature.

Figure 4 :
Figure 4: BDT discriminant distributions for the data and background processes in the H → eτ h channel.A B(H → eτ) = 20% is assumed for the signal.The channel categories are 0 jets (upper row left), 1 jet (upper row right), 2 jets ggH (lower row left), and 2 jets VBF (lower row right).The lower panel in each plot shows the ratio of data and estimated background.The uncertainty band corresponds to the background uncertainty in which the post-fit statistical and systematic uncertainties are added in quadrature.

Figure 5 :
Figure 5: BDT discriminant distributions for the data and background processes in the H → eτ µ channel.A B(H → eτ) = 20% is assumed for the signal.The channel categories are 0 jets (upper row left), 1 jet (upper row right), 2 jets ggH (lower row left), and 2 jets VBF (lower row right).The lower panel in each plot shows the ratio of data and estimated background.The uncertainty band corresponds to the background uncertainty in which the post-fit statistical and systematic uncertainties are added in quadrature.

Figure 6 :
Figure 6: The m col distribution in VR with same electric charge for both leptons (left), W+jets VR (middle), and tt VR (right).In each distribution, the VR's dominant background is shown, and all the other backgrounds are grouped into "Other bkg.".A B(H → µτ) = 20% is assumed for the signal.The lower panel in each plot shows the ratio of data and estimated background.The uncertainty band corresponds to the background uncertainty in which the post-fit statistical and systematic uncertainties are added in quadrature.

Figure 8 :
Figure 8: Expected (red line) and observed (black solid line) 95% CL upper limits on the LFV Yukawa couplings, |Y µτ | vs. |Y τ µ | (left) and |Y eτ | vs. |Y τe | (right).The |Y µτ | or |Y eτ | couplings correspond to left chiral muon or electron and right chiral τ lepton, while |Y τ µ | or |Y τe | couplings correspond to left chiral τ lepton and right chiral muon or electron.In the left plot, the expected limit is covered by the observed limit as they have similar values.The flavor diagonal Yukawa couplings are approximated by their SM values.The green and yellow bands indicate the range that is expected to contain 68% and 95% of all observed limit variations from the expected limit.The shaded regions are constraints obtained from null searches for τ → 3µ or τ → 3e (dark blue)[92] and τ → µγ or τ → eγ (purple)[93].The blue diagonal line is the theoretical naturalness limit |Y ij Y ji | = m i m j /v 2[11].

< 1 .
11×10 −3 and √ |Y eτ | 2 + |Y τe | 2 < 1.35×10 −3 .These results constitute an improvement over the previous limits from CMS and ATLAS experiments.dia; the Ministry of Science and Higher Education and the National Science Center, contracts Opus 2014/15/B/ST2/03998 and 2015/19/B/ST2/02861 (Poland); the National Priorities Research Program by Qatar National Research Fund; the Ministry of Science and Higher Education, project no.0723-2020-0041 (Russia); the Programa Estatal de Fomento de la Investigaci ón Científica y Técnica de Excelencia María de Maeztu, grant MDM-2015-0509 and the Programa Severo Ochoa del Principado de Asturias; the Thalis and Aristeia programs cofinanced by EU-ESF and the Greek NSRF; the Rachadapisek Sompot Fund for Postdoctoral Fellowship, Chulalongkorn University and the Chulalongkorn Academic into Its 2nd Century Project Advancement Project (Thailand); the Kavli Foundation; the Nvidia Corporation; the SuperMicro Corporation; the Welch Foundation, contract C-1845; and the Weston Havens Foundation (USA).

Table 1 .
GeV, and the electron has p T above 12 GeV.The muon is required to have p T > 24 GeV, |η| < 2.4, and IThe input variables to the BDT are p µ rel < 0.15.The electron is required to have p T > 13 GeV, |η| < 2.5, and I e rel < 0.1.The selections for the µτ e channel are summarized in

Table 1 :
Event selection criteria for the H → µτ channels.

Table 2 .
GeV.The electron is required to have p T > 27 GeV, |η| < 2.1, and I e rel < 0.15.The τ h is required to have p T > 30 GeV and |η| < 2.3.The selections for the eτ h channel are summarized in The input variables to the BDT are p , m col , m vis , m T (τ, p miss T ), ∆η(e, τ h ), ∆φ(e, τ h ), and ∆φ(τ h , p miss T T

Table 2 :
Event selection criteria for the H → eτ channels.

Table 3 :
Systematic uncertainties in the expected event yields.All uncertainties are treated as correlated among categories, except those with two values separated by the ⊕ sign.In this case, the first value is the correlated uncertainty and the second value is the uncorrelated uncertainty for each category.Figure7: Observed (expected) 95% CL upper limits on the B(H → µτ) (left) and B(H → eτ) (right) for each individual category and combined.The categories from top to bottom row are µτ h 0Jets, µτ h 1Jet, µτ h 2 Jets, µτ h VBF, µτ e 0Jets, µτ e 1Jet, µτ e 2 Jets, µτ e VBF, and µτ combined (left) and eτ h 0Jets, eτ h 1Jet, eτ h 2 Jets, eτ h VBF, eτ µ 0Jets, eτ µ 1Jet, eτ µ 2 Jets, eτ µ VBF, and eτ combined (right).

Table 6 :
Summary of observed and expected upper limits at 95% CL, best fit branching fractions and corresponding constraints on Yukawa couplings for the H → µτ and H → eτ channels.