Search for lepton-flavor-violating decays of the Z boson into a τ lepton and a light lepton with the ATLAS detector

Direct searches for lepton flavor violation in decays of the Z boson with the ATLAS detector at the LHC are presented. Decays of the Z boson into an electron or muon and a hadronically decaying τ lepton are considered. The searches are based on a data sample of proton-proton collisions collected by the ATLAS detector in 2015 and 2016, corresponding to an integrated luminosity of 36 . 1 fb − 1 at a center-of-mass energy of ﬃﬃﬃ s p ¼ 13 TeV. No statistically significant excess of events above the expected background is observed, and upper limits on the branching ratios of lepton-flavor-violating decays are set at the 95% confidence level: B ð Z → e τ Þ < 5 . 8 × 10 − 5 and B ð Z → μτ Þ < 2 . 4 × 10 − 5 . This is the first limit on B ð Z → e τ Þ with ATLAS data. The upper limit on B ð Z → μτ Þ is combined with a previous ATLAS result based on 20 . 3 fb − 1 of proton-proton collision data at a center-of-mass energy of ﬃﬃﬃ s p ¼ 8 TeV and the combined upper limit at 95% confidence level is B ð Z → μτ Þ < 1 . 3 × 10 − 5 . DOI: 10.1103/PhysRevD.98.092010


Introduction
One of the main goals of the physics program of the Large Hadron Collider (LHC) at CERN is to discover physics beyond the Standard Model (SM).The observation of lepton flavor violation in decays of the Z boson into a pair of leptons of different flavors would give a clear indication for new physics.These decays can occur within the SM only via neutrino oscillations and would have a rate too small to be detected [1].This paper presents searches by the ATLAS Collaboration for the decays of the Z boson into a τ-lepton and an electron or a muon, hereafter referred to as a light lepton or , are presented.Only final states with a hadronically decaying τ-lepton are considered.
The searches for LFV Z decays presented in this paper use a data sample of proton-proton collisions collected at a center-of-mass energy of √ s = 13 TeV with the ATLAS detector at the LHC.These data correspond to an integrated luminosity of 36.1 fb −1 .The signal model used assumes unpolarized τ-leptons.Events are classified using neural networks, and the output distribution is used in a template fit to data to extract the Z boson lepton-flavor-violating branching ratios, or otherwise set upper limits on these values.The major backgrounds to the search are reducible backgrounds such as W+jets, top-quark pair production and Z → , and the irreducible background Z → ττ → + hadrons + 3ν.Reducible backgrounds from events with a quark-or gluon-initiated jet misidentified as a hadronically decaying τ-lepton, so-called "fakes", are estimated via a data-driven method.The reducible backgrounds from Z → , where one light lepton fakes a hadronic τ-lepton decay signature, are estimated using simulation.An event selection specifically designed to reduce the contribution from this background is applied.The shape of the template for the irreducible background from Z → ττ is estimated via simulations and its magnitude is determined in the fit to data.
The results of the search for the LFV Z → µτ decays presented in this paper are combined with the previous ATLAS results based on 8 TeV data.This paper is structured as follows.Section 2 briefly describes the ATLAS detector and the reconstruction of the detected particles.Section 3 details the data sample and the simulations used in the analysis.Section 4 describes the event selection and classification criteria.Section 5 discusses the methodology used to estimate the yield of events from background sources, and Section 6 lists the experimental and theoretical systematic uncertainties affecting the analysis.The statistical interpretation of the observed data and the results are presented in Section 7. The combination of the result in the Z → µτ channel with the previous ATLAS result from 8 TeV data is also presented.Finally, Section 8 summarizes the analysis.
of pileup, a jet vertex tagger (JVT) algorithm is used for jets with p T < 60 GeV and |η| < 2.4.The JVT algorithm employs a multivariate technique based on jet energy, vertexing, and tracking variables in order to determine the likelihood that jets originate from or are heavily contaminated by pileup [17].
In order to identify jets containing b-hadrons (b-jets), a multivariate algorithm is used that depends on the presence of tracks with a large impact parameter with respect to the primary vertex [18], on the presence of displaced secondary vertices, and on the reconstructed flight paths of band c-hadrons associated with the jet [19].Using this algorithm, jets are b-tagged if they satisfy criteria tuned to produce a 77% b-jet efficiency in simulated t t events.Hadronic τ-lepton decays result in a neutrino and a set of visible decay products (τ had-vis ), typically one or three charged pions and up to two neutral pions [20].The reconstruction of the visible decay products [21] is seeded by jets.Selected τ had-vis candidates are required to have p T > 20 GeV, |η| < 2.5 excluding 1.37 < |η| < 1.52, one (1-prong) or three (3-prong) associated tracks with p T > 1 GeV, and an electric charge of ±1.A boosted decision tree (BDT) identification procedure that is based on calorimetric shower shapes and tracking information is used to discriminate τ-lepton decays from jet backgrounds [22,23].All events used in this analysis must have a τ had-vis candidate that passes the "loose" identification working point.For events in the signal region, the τ had-vis candidate must satisfy the "tight" identification criterion.Selected events that are not in the signal region are used to estimate backgrounds (Section 5).The combined reconstruction and identification efficiencies for "loose" and "tight" criteria are 60% (50%) and 45% (30%) for 1-prong (3-prong) hadronic τ-lepton decays, and are independent of the τ had-vis p T and the number of pileup interactions.To reduce the number of muons misidentified as τ had-vis , a τ had-vis candidate is excluded if it is within ∆R = 0.2 of a reconstructed muon with p T > 2 GeV.An additional BDT, denoted hereafter by eBDT, is used to reduce the number of electrons misidentified as τ had-vis , providing 85% (95%) efficiency for 1-prong (3-prong) hadronic τ-lepton decays.The leading-p T candidate is selected as the τ had-vis candidate, while any other candidates are considered to be jets.
Objects that overlap geometrically are removed in the following order: (a) jets within ∆R = 0.2 of selected τ had-vis candidates are excluded; (b) jets within ∆R = 0.4 of an electron or a muon are excluded; (c) any τ had-vis within ∆R = 0.2 of an electron or a muon is excluded; and (d) electrons within ∆R = 0.2 of a muon are excluded.
The missing transverse momentum, with magnitude E miss T , is calculated as the negative vectorial sum of the transverse momenta of all fully reconstructed and calibrated physics objects [24].The procedure includes a "soft term", calculated from inner-detector tracks that originate from the hard-scattering vertex but are not matched to a reconstructed object.

Data and simulated event samples
This search analyzes proton-proton collisions recorded by the ATLAS detector at the LHC during 2015 and 2016 at a center-of-mass energy of √ s = 13 TeV.The data correspond to a total integrated luminosity of 36.1 fb −1 after requiring that all relevant components of the ATLAS detector were in good working condition during data collection.The uncertainty in the combined 2015 and 2016 integrated luminosity is 2.1%.It was estimated following a methodology similar to the one described in Ref. [25].The events considered for the eτ (µτ) channel were selected by single-lepton triggers which require the presence of at least one electron (muon) with transverse momentum above 24 GeV (20 GeV) in 2015 and 26 GeV (26 GeV) in 2016.These triggers apply isolation criteria for electrons (muons) with p T below 60 GeV (40 GeV in 2015 and 50 GeV in 2016).These isolation requirements are looser than the ones applied offline in the light-lepton selections.Simulated Monte Carlo (MC) samples are used to predict the Z/γ * → τ signal and the background contributions from Z/γ * +jets, W+jets, t t, single top-quark, Higgs boson and diboson (WW, W Z and Z Z) production.
Signal samples were simulated using P 8.186 [26] with the NNPDF2.3parton distribution function (PDF) set [27] and a set of tuned parameters called the A14 tune [28].The lepton-flavor-violating Z/γ * decay was modeled assuming unpolarized τ-leptons in the final state.In order to use the same production cross section for both signal and the main background, Z/γ * → ττ, event weights computed as a function of the true boson transverse momentum are applied to the signal events to match the more accurate modeling of the Z/γ * production in the Z/γ * → ττ simulation described in the following.After this reweighting procedure, the signal events, together with the Z/γ * → ττ events, are normalized to the Z/γ * production cross section determined from data in the template fit described in Section 7. Therefore, the analysis is independent of the theoretical uncertainty in the Z/γ * production cross section.The SM value of this cross section is 2.1 nb, calculated at NNLO accuracy [29].
The production of Z/γ * → ττ events was simulated with S 2.2.1 [30].The NNPDF 3.0 NNLO PDF set [31] was used for both the matrix element calculation and the dedicated parton-shower tuning developed by the authors of S .The event generation utilized Comix [32] and OpenLoops [33] for the matrix element calculation, which was then matched to the S parton shower using the ME+PS@NLO prescription [34].The matrix elements were calculated for up to two additional partons at NLO and for three and four partons at LO in QCD.As stated above, the normalization of this background process, together with the signal events, is determined in a fit to data.
The Z/γ * → µµ, ee events were simulated with P -B [35][36][37] using the CT10 PDF set [38] and the AZNLO tune [39], and interfaced to P 8.186.The normalization of the Z/γ * → µµ, ee events is determined from data in a dedicated region enhanced in Z → µµ events (Section 5) as a function of the reconstructed transverse momentum of the Z/γ * boson.
The other simulated processes account for only a small fraction (less than 0.3%) of the background events.Samples of W(→ τν) + jets events were simulated with S 2.2.1.Events with a top-quark pair or a single top quark produced via electroweak t-channel, s-channel and Wt-channel processes were simulated with P -B using the CT10 PDF set.The parton shower, fragmentation and underlying event were simulated using P 6.428 [40] with the Perugia 2012 tune [41].EvtGen [42] was used to decay bottom and charm hadrons.Diboson processes were simulated with S 2.1.0with the CT10 PDF set.Higgs boson events, H → WW, ττ, , produced via gluon-gluon fusion and vector-boson fusion were simulated with P -B .
Simulated minimum-bias events were overlaid on all simulated samples to include the effect of pileup.These minimum-bias events were generated with P 8.186, using the A2 tune [43] and the MSTW2008LO PDF set [44].Each simulated event was processed using the G -based ATLAS detector simulation [45,46] and the same event reconstruction algorithms used for the data.Reconstruction and identification efficiencies, as well as energy calibrations for all selected objects in simulated events, are corrected to match those measured in data.

Event selection and classification
Of the events satisfying the trigger and the quality criteria described in Section 3, the events selected in this analysis are required to contain exactly one isolated electron or muon that is geometrically matched to the object that fired the trigger, and no additional light leptons.These events must also contain at least one τ had-vis candidate that passes the tight identification.The isolated light lepton and the τ had-vis candidate are required to have opposite charge, q q τ had-vis = −1.Events with one or more b-tagged jets are removed to reject background events with a top-quark pair or a singly produced top quark.To reduce the Z → background, events with 1-prong τ had-vis candidates that satisfy |η(τ had-vis )| > 2.2 for the eτ channel or |η(τ had-vis )| < 0.1 for the µτ channel are rejected.These regions of the detector are excluded because they are insufficiently instrumented and therefore affected by higher → τ misreconstruction and misidentification rates.The selection described here, denoted hereafter to as preselection, defines the sample of events used for the training of the neural network.
Further kinematic selections are applied to define the sample of events in the "signal region" (SR) which are used in the final template fit.Orthogonal sets of events in the so-called "calibration regions" (CR) are defined by inverting some of the preselection or SR selection requirements and used to estimate background contributions in the SR, as described in Section 5.
Events accepted in the SR must satisfy the preselection and the following selections.The transverse mass, m T (τ had-vis , E miss T ) ≡ 2p T (τ had-vis )E miss T 1 − cos ∆φ(τ had-vis , E miss T ) , is required to be smaller than 35 (30) GeV in the eτ(µτ) channel.Signal events are expected to have the missing transverse momentum from the neutrino in a direction close to the τ had-vis candidate, resulting in small m T (τ had-vis , E miss T ) values.The W(→ ν/τν)+jets events and some of the Z/γ * → ττ events have instead higher m T (τ had-vis , E miss T ) values.The effectiveness of this selection is illustrated in Figure 1.In events with a 1-prong τ had-vis candidate, an additional selection is applied to further reduce the Z → background.In most of these events, the momentum of the track matched to the 1-prong τ had-vis candidate corresponds to the original momentum of the light lepton misidentified as τ had-vis , while the energy deposited in the calorimeter and used to estimate the energy of the τ had-vis originates from radiation (lightlepton bremsstrahlung) or other sources.Therefore, events in which the invariant mass of the τ had-vis track and the light lepton (m(track, )) is compatible with the Z boson mass are rejected.In particular, events with a 1-prong τ had-vis candidate are accepted when m(track, ) < 84 GeV or m(track, ) > 105 GeV if |η(τ had-vis )| < 2.0, and when m(track, ) < 80 GeV or m(track, ) > 105 GeV if |η(τ had-vis )| > 2.0.A wider range in m(track, ) is rejected at high |η(τ had-vis )| because of the smaller signal contribution and the higher Z → background rate.Moreover, events in which the invariant mass of the 1-prong τ had-vis candidate and the light lepton satisfies 80 GeV < m(τ had-vis , ) < 100 GeV are required to have m(track, ) > 40 GeV.The impact of this selection is illustrated in Figure 2. The signal selection efficiency in the SR is 3.2% for the eτ channel and 3.5% for the µτ channel.A summary of the event selection criteria is given in Table 1.
Events accepted in the SR are classified using neural networks (NNs) trained to discriminate Z → τ signal from Z → ττ, Z → and W → ν +jets background events.The classification is based on event kinematic properties that are extracted by the NN from the reconstructed momenta of the selected particles, as well as from other event variables.The NN achieves good performance using low-level variables, such as the particle momentum components, due to the network's capability to build non-linear relations between input variables.Preselection one isolated tight light lepton with p T > 30 GeV matched to a lepton selected at trigger level leading τ had-vis with p T > 20 GeV, N tracks τ = 1 or 3 and passing tight identification Three types of NN classifiers, "Z", "Zll" and "W", are trained to distinguish signal from Z → ττ, Z → and W → ν backgrounds, respectively.These classifiers are trained separately in the eτ and µτ channels because of the different detector acceptances, but combine 1-prong and 3-prong τ had-vis candidates.Simulated events passing the preselection (Table 1) are used to train, optimize and validate the classifiers.In order to increase the size of the available training samples for Z → τ and Z → ττ processes with a true hadronic τ-lepton decay, all events with a τ had-vis candidate that passes the loose identification are used.Moreover, in the events used for the Zll classifiers, the misreconstructed τ had-vis is required to be either a true muon or electron.With these requirements, about 40 000 signal events, 200 000 Z → ττ events and 80 000 W → ν events are used for training in each channel.For Z → , about 30 000 events are used in the eτ channel and only 5000 events in the µτ channel.The limited number of Z → µµ events is due to the low µ → τ misreconstruction rate, and leads to poor classification power for the Zll NN in the µτ channel.However, the Z → µµ background is effectively reduced by the selection on m(track, ) and m(τ had-vis , ) described earlier.
The input variables common to all the classifiers are: the light lepton, τ had-vis and E miss T momentum components, assuming vanishing masses; the collinear mass m coll , defined as the invariant mass of the -τ had-vis -ν system, where ν is the neutrino from the τ decay, which is assumed to have a momentum that is equal in the transverse plane to the measured E miss T and collinear in η with the τ had-vis candidate; and ∆α [47]: where p(τ had-vis ) and p( ) are the four-momenta of the τ had-vis and the light-lepton candidates respectively, and the rest masses m Z and m τ take on values reported by the Particle Data Group [20].The variable ∆α helps to discriminate signal events, expected to be around ∆α = 0, from Z → ττ events, where ∆α is negative due to the presence of additional neutrinos.Even though not specifically targeted by this variable, Z → and W → ν events tend to be at vanishing and positive values of ∆α, respectively, as shown later in Figures 5-8.The invariant mass m( , τ had-vis ) is also used in the Zll classifier.In the limit of very large training statistics, the light lepton, τ had-vis and E miss T momentum components would be sufficient for the NN to learn the full event kinematics.However, with the available training samples, the high-level variables m coll , ∆α and m( , τ had-vis ) were found to be able to improve the NN classification power and  The NN inputs are preprocessed to harmonize their magnitudes and to remove known symmetries as is required for optimal training.The preprocessing consists of the following steps: 1. Boost: after computing m coll , ∆α and p tot = p( ) + p(τ had-vis ) + E miss T in the lab frame, the light lepton, τ had-vis and E miss T momenta are boosted to the frame in which their total momentum vanishes.The longitudinal component of the three-momentum of E miss T is zero in the lab frame.
2. Rotation: the light lepton, τ had-vis and E miss T momenta are first rotated so that the three-momentum of the light lepton is along the positive z-axis.A second rotation about the z-axis is applied so that the τ had-vis momentum has a vanishing component on the y-axis.
3. Scaling: each input variable is scaled by subtracting its mean and by dividing by its standard deviation, where the mean and the standard deviation are computed on the set of signal and background events used in the training of each classifier.
The boost and the rotation are used to remove the degeneracy among apparently different events which are instead equivalent under Lorentz transformation.The scaling is needed because the network works best with input variables of the same magnitude.The same preprocessing procedure, with the same mean and standard deviation values, is applied to all the events on which the classifiers are evaluated.After preprocessing, six of the twelve components of the light lepton, τ had-vis and E miss T momenta are either vanishing or redundant, and therefore not included in the network inputs.The resulting lists of input variables are given in Table 2.The transverse component, p tot T , of the total momentum p tot in the lab frame is also included as otherwise this information would be lost after the preprocessing.The distributions of some of the NN input variables are shown in Section 7.
The NN classifiers are sequential models optimized for binary classification.They are based on the K 1.1.1[48] and T 0.11 [49] packages, using a standard implementation for binary classifiers having two hidden dense layers with 16 nodes each.
In order to obtain a single discriminating variable, the outputs of the classifiers evaluated in each event are combined in the following way.In events with 3-prong τ had-vis candidates, where no further rejection is needed against the Z → events, the Z and W classifiers are combined as the distance in the twodimensional plane from the point with highest NN outputs, where the NN outputs can range within [0, 1]: In a similar fashion, for events with 1-prong τ had-vis candidates, the Z, W and Zll classifiers are combined as: The binned distributions of these combined classifiers for the events selected in the SR are used in the final template fit, as discussed in Section 7.

Background estimation
Background processes are categorized according to the origin of the τ had-vis candidate, which can be a true τ-lepton, or a misidentified light lepton, or a misidentified quark-or gluon-initiated jet.Different techniques are used to estimate these background contributions in the SR, as well as to model their expected combined NN output distributions, which are used in the template fit to data (Section 7).As described in the following, the shapes of all components are determined prior to the fit, as are the normalizations for all but the Z → ττ and fake components, which are determined in the fit.
Backgrounds from processes with a true hadronically decaying τ-lepton are estimated from simulation.The Z → ττ decays are the dominant source of these events.As detailed in Section 3, they are modeled via simulation but their total yield in the SR is left unconstrained in the template fit to data in order to remove the theoretical systematic uncertainties in the Z production cross section.
Processes where the τ had-vis candidate is a misidentified light lepton are also estimated from simulation.These are mostly Z → events.The simulated rate for misidentifying electrons as 1-prong τ had-vis candidates is corrected using data [23].Due to the lack of dedicated measurements of the rates of misidentifying electrons as 3-prong τ had-vis candidates and muons as 1-prong τ had-vis candidates, conservative uncertainties are assigned which have negligible impact on the precision of the measured B(Z → τ).
The normalization of the Z → events is determined from data with a sample of events with an oppositecharge muon pair with 81 GeV < m µµ < 101 GeV.The preselection requirements on the leading muon, the absence of b-tagged jets and the veto on additional light leptons are imposed.A correction factor derived as the relative difference between the predicted and observed numbers of Z → µµ events is applied to both the Z → ee and Z → µµ yields in the SR.This correction is applied as a function of the reconstructed transverse momentum of the Z/γ * boson.In the Z → µµ-enhanced region, the Z/γ * boson momentum is computed as the vector sum of the muon pair, while in the SR it is the vector sum of the misidentified τ had-vis candidate and the remaining light lepton.The uncertainty in this correction is statistical only.Differences between the electron and muon acceptances are covered by the systematic uncertainties in the electron and muon selections, which are accounted for in the Z → predictions in the SR.Events where the τ had-vis candidate originates from a quark-or gluon-initiated jet are estimated from data, as discussed in the following.Background contributions originating from processes where only the light lepton is misidentified as τ had-vis are found to be negligible.
The background contribution from events where the τ had-vis candidate arises from a misidentified jet is referred to as "fakes" and is dominated by W+jets and multi-jet processes.A data-driven fake-factor technique is used to estimate this contribution.It uses events in the so-called "fail sideband", which is the set of events passing all but one of the SR selection requirements: the τ had-vis candidate is required to fail the tight identification requirement.This is a set of events orthogonal to the ones selected in the SR and enhanced with fakes.The yield of these events is corrected by the fake factor, which is the transfer factor needed to scale the fail sideband sample to the amount of background expected in the signal region, which requires an identified τ had-vis candidate.This factor is process-specific as it depends on the fractions of quark-and gluon-initiated jets that are misidentified as τ had-vis candidates.It also depends on properties of the τ had-vis candidate.To capture these effects, different fake factors are measured in samples of events dominated by different processes and different τ had-vis kinematic properties.
Fake factors F i are measured in four data samples of events dominated by W + jets ("CRW"), t t and single-top ("CRT"), Z → + jets ("CRZll"), and multi-jet ("CRQ") events.The selections that define these "calibration regions" (CR) are similar to the SR selection but define orthogonal samples dominated by the target source of background.These selections are detailed in Table 3 together with the expected purities in each CR for the target process as estimated from simulation.For CRQ the purity is estimated as the number of events in data, after subtracting the contribution from other processes estimated from simulation, divided by the total number of events.
In each CR, F i is measured in data as the ratio of the number of events where the τ had-vis candidate passes the tight identification to the number of events where the τ had-vis candidate fails in bins of the τ had-vis p T .Contributions from background processes that are not the target process of the CR or from events where the τ had-vis candidate does not originate from a jet are subtracted from data using simulation.The four F i are combined into a weighted average F = i R i F i , where R i is the fraction of events from fakes in the SR as predicted by simulation for each process.For multi-jet events, this fraction is defined as Fake factors are measured separately for τ had-vis candidates with one and with three associated tracks.For 1-prong candidates, they are estimated in two-dimensional bins of τ had-vis p T and τ had-vis track p T , since the associated track momentum is used in the selection of these candidates, while for 3-prong candidates they are estimated only in bins of τ had-vis p T .The choice of bin boundaries is optimized to capture the statistically significant variations of the fake factors as a function of the τ had-vis properties, while retaining enough events per bin.An additional binning as a function of τ had-vis |η| was found to be unnecessary.The measured fake factors are shown in Table 4.For events with low τ had-vis p T and high τ had-vis track p T , the fake factors are large and have large statistical uncertainties because there are few events in the calibration regions.However, these fake factors are applied only to a small fraction of events in the sidebands.
Table 4: The fake factors binned in τ had-vis p T and τ had-vis track p T for 1-prong, and τ had-vis p T for 3-prong events as determined in the SR.

1-prong
eτ events µτ events τ had-vis p T 20-30 GeV  The number of events from fakes in the SR is: where F k is the fake factor corresponding to the p T (and track p T for 1-prong τ had-vis ) bin k, N fail SR,data is the number of data events in the fail sideband in bin k, and N fail SR,MC,not jet→τ is the number of events in the fail sideband in bin k for which the τ had-vis candidate did not originate from a jet as predicted by simulation.
The sources of uncertainty in the estimate of the fake background are the statistical uncertainties in the F measurements in each bin, the statistical uncertainties of the data in the fail sideband and the uncertainty in R i .All statistical uncertainties are treated as independent.The uncertainty in R i is estimated by varying the estimated R W by 50%, although this has a negligible impact on the sensitivity.
In order to reduce the uncertainty in the overall normalization of the contributions from fakes in the SR, normalization factors for the fake-component templates are free parameters in the fit to data, as discussed in Section 7. As a result, only uncertainties in the template shapes affect the fitted yields for fake components.
The simulation and the data-driven techniques used to model the signal and background processes were validated in samples enriched with fakes and Z → ττ events.Both the predicted NN input and output distributions are in agreement with data.

Systematic uncertainties
Systematic uncertainties affecting the estimations of signal and background contributions arise from the theoretical predictions and the detector modeling used in simulation, the luminosity measurement, and the data-driven background estimations.
The theoretical uncertainties in the production cross section affect only the predictions of the simulated W+jets, top, diboson and Higgs boson events with a true hadronically decaying τ-lepton, since the Z → ττ and signal yields are determined in the template fit to data.These constitute a small fraction of the background events in the SR, and a conservative uncertainty in their production cross sections was assigned with negligible impact on the final results.As described in Section 5, Z → events are normalized to data using Z → µµ events, so the theoretical uncertainty in the Z → normalization is irrelevant.The statistical uncertainty of 0.1% in this normalization correction is included as a systematic uncertainty.
Uncertainties arising from the simulation of the detector and pileup conditions in the reconstruction of τ had-vis candidates, muons, electrons, jets (including b-tagging) and E miss T are evaluated.Sources of uncertainty in the τ had-vis candidate include the reconstruction and identification efficiencies and the energy calibration.These are applied only to τ had-vis candidates from hadronically decaying τ-leptons.For misidentified τ had-vis candidates originating from an electron or a muon, systematic uncertainties in the misidentification rates are assigned using a data-driven method, as detailed in Section 5.For the simulation of electron and muon candidates, uncertainties in the trigger, reconstruction, identification and isolation efficiencies are accounted for.The effect of uncertainties in the light-lepton momentum scale and resolution is also evaluated.For jets, uncertainties in the jet momentum scale and resolution, as well as in the b-tagging (in)efficiencies are accounted for.All experimental uncertainties are propagated to the E miss T calculation.In addition, uncertainties in the energy scale and resolution of the E miss T soft term are considered.
The 2.1% uncertainty in the measured luminosity (Section 3) is only considered for the simulated W+jets, top, diboson and Higgs boson contributions, whose normalizations are based purely on simulation, without any data-driven estimate.
Data-driven techniques are used to estimate the background contributions from events with a τ had-vis candidate originating from either a light lepton or a quark-or gluon-initiated jet.The systematic uncertainties in these methods are described in Section 5.
To illustrate the sizes of the systematic uncertainties, Figure 3 shows the relative uncertainties of the total background predictions as a function of the combined NN output for the dominant systematic uncertainties.The uncertainties in the normalizations of the Z and fake components, estimated from the expected statistical power of the fit described in Section 7, and the statistical uncertainty in the fake factor are the largest sources of systematic uncertainty, contributing on average between 3% and 6%.The systematic uncertainty in R W is also relevant and ranges between 1% and 6% over the different final states.All other systematic uncertainties affect the total background prediction by less than one percent.Figure 3: Expected uncertainties in the total background predictions in the SR as a function of the combined NN output for the dominant systematic uncertainties in eτ (top) and µτ (bottom) channels with 1-prong (left) and 3-prong (right) τ had-vis candidates.The uncertainties in the normalizations of the Z and fake components are based on the expected statistical power of the fit described in Section 7. "Muon efficiency statistics" refers to the statistical uncertainty of the corrections applied to the simulated muon reconstruction efficiency [13]."Tau energy scale in situ" refers to the uncertainty of the corrections applied to the energy of the τ had-vis candidate based on measurements with Z → ττ data [23].

Results and statistical interpretation
A binned maximum-likelihood fit to data, performed with the statistical analysis packages RooFit [50], RooStats [51] and HistFitter [52], is used to compare the observed binned distributions of the combined NN classifiers in the SR with the model, and to extract evidence of signal events.The parameter of interest in such fit is the signal strength modifier µ, which quantifies the size of the LFV decay branching fraction B(Z → τ).
Two independent fits are performed for the eτ and µτ channels, and in each fit events with a 1-prong τ had-vis candidate are considered separately from those with a 3-prong candidate.In the fit of the events with 1-prong τ had-vis candidates, because of the way the NN classifiers are combined, only a few backgroundlike events have an NN output value below 0.15; these are excluded.Independent templates, estimated as described in previous sections, are used for signal, Z → ττ, fakes, Z → , top events, and W(→ τν)+jets events.The small contributions from Higgs boson and diboson events are summed into a single template, referred to as "Other".
The likelihood is the product of Poisson probability density functions describing the observed number of events in each bin.It also includes Gaussian, Poisson and log-normal distributions to constrain the nuisance parameters associated with the systematic, statistical and theoretical uncertainties in the predicted number of events, respectively.Three additional free parameters are included: µ(Z) determines the normalizations of the Z → ττ and signal events while µ(fakes_1P) and µ(fakes_3P) control the normalization of the fake component in events with a 1-prong or a 3-prong τ had-vis candidate, respectively.These parameters are fit independently in the eτ and µτ channels.Within the same channel, the same µ(Z) is used to fit events with 1-prong and 3-prong τ had-vis candidates, while µ(fakes_1P) and µ(fakes_3P) are used to fit independently the corresponding contributions from fakes.The fitted values of these parameters are sensitive to the yields of events with low NN outputs, which are dominated by contributions from Z → ττ and fakes.Fitting these normalization parameters reduces the systematic uncertainties in the predictions of the Z → ττ and fake backgrounds in the bins at high NN output, which are sensitive to the Z → τ signal.The free parameter µ(Z), which scales the normalizations of both the Z → ττ and signal events, ensures that the two processes correspond to the same Z production cross section.Table 5 reports the total observed and post-fit yields in the SR.The observed and post-fit expected distributions of the combined NN output are shown in Figure 4.As reported in Table 6, the best-fit values for µ(Z), µ(fakes_1P) and µ(fakes_3P) are consistent between the eτ and µτ channels, while the best-fit value for B(Z → τ) is consistent with zero in the µτ channel, B(Z → µτ) = (−0.Observed and expected post-fit distributions of the unscaled NN inputs of the events in the SR are shown in Figures 5-8.The post-fit distributions are compatible with data.An alternative fit combining the eτ and µτ channels with two independent parameters of interest and the same shared free parameter µ(Z) yielded the same results as the nominal fit.
The result of the search for Z → µτ decays presented here is combined with the result published by ATLAS with 20.3 fb −1 of data at a center-of-mass energy of √ s = 8 TeV [7].In this previous analysis, a 95% CL upper limit was set at B(Z → µτ) < 1.7 × 10 −5 .The expected upper limit was 2.6 × 10 −5 .
The analysis of the 8 TeV data was based on a template fit to the observed distributions in data of the m MMC τµ mass, as reconstructed by using the Missing Mass Calculator [54].This is a likelihood-based mass estimator optimized for Z → ττ events.The dominant irreducible Z → ττ background was estimated using so-called embedded events [55] and was normalized to data.The reducible background of events with τ had-vis candidates originating from misidentified jets was also estimated from data using events with µτ pairs with the same electric charges.The other smaller background contributions were estimated from simulation.The Z → µτ signal was simulated and was normalized using the predicted Z production cross section at 8 TeV.
The 8 TeV and 13 TeV analyses are combined using the same parameter of interest, but assuming no other correlation.Indeed, the estimates of the two dominant sources of background, Z → ττ and fakes, are based on different data and different methods.The signal predictions are also uncorrelated since the Z production cross section is either predicted, in the 8 TeV analysis, or determined from data, in the 13 TeV analysis.Furthermore, the systematic uncertainties related to the detector modeling in simulated data are typically based on auxiliary measurements performed on different data.If these modeling uncertainties are set to zero, the combined upper limit changes by only 3%.This 3% represents an upper bound on how Table 5: The total observed and post-fit event yields in the SR for the eτ (top) and µτ (bottom) channels.The uncertainties include both the statistical and systematic contributions.The correlations between the uncertainties in individual contributions are accounted for in the quoted uncertainties in the total post-fit yields.

ATLAS
Data / pred.0.8 0.9      (c) Êmiss        The overlaid dashed line represents the expected distribution for the signal normalized to B(Z → τ) = 10 −3 .In the panels below each plot, the ratios of the observed data (dots) and the post-fit background plus signal (solid line) to the post-fit background are shown.The hatched error bands represent the combined statistical and systematic uncertainties.The first and last bins include underflow and overflow events, respectively.much the combined limit can change if different assumptions are made about correlations in systematic uncertainties related to detector modeling.

Conclusions
Direct searches for lepton flavor violation in decays of the Z boson are performed using a data sample of proton-proton collisions recorded by the ATLAS detector at the LHC corresponding to an integrated luminosity of 36.1 fb −1 at a center-of-mass energy of √ s = 13 TeV.The analysis selects events consistent with the decay of a Z boson into an electron or muon and a hadronically decaying τ-lepton.In these decays the τ-lepton is assumed to be unpolarized.Neural network classifiers are used to discriminate signal from backgrounds, and the NN output distributions are analyzed in a template fit to data.
No significant excess of events above the expected background is observed and upper limits on the leptonflavor-violating branching ratios are set at the 95% confidence level using the CL s method: B(Z → µτ) < 2.4 × 10 −5 and B(Z → eτ) < 5.8 × 10 −5 .The corresponding expected upper limits are 2.4 × 10 −5 and 2.8 × 10 −5 , respectively.An excess of data over the expected backgrounds is observed in the eτ final state with a significance of 2.3σ.
No upper limits on B(Z → eτ) from ATLAS data have been published previously.The current best upper limit is from LEP at B(Z → eτ) < 0.98 × 10 −5 .
The result on B(Z → µτ) presented here is combined with the previous ATLAS result based on 20.3 fb −1 of data at a center-of-mass energy of √ s = 8 TeV.The combined 95% CL upper limit is B(Z → µτ) <1.3 ×10 −5 , to be compared with LEP upper limit of B(Z → µτ) <1.2 ×10 −5 .

Figure 2 :
Figure 2: Expected distributions of m(track, ) versus m(τ had-vis , ) in signal (left) and Z →(right) events with 1-prong τ had-vis candidates in the eτ (top) and µτ (bottom) channels after the SR selection except for the cuts on these two variables (Table1).
total momentum m coll collinear mass ∆α see Eq. (1) [47] m( , τ had-vis ) invariant mass of light lepton and τ had-vis were therefore included among the NN inputs.

Figure 4 :
Figure 4: Observed and expected post-fit distributions of the combined NN output in SR for the eτ (top) and µτ (bottom) channels, for 1-prong (left) and 3-prong (right) τ had-vis candidates.The filled histogram stacked on top of the backgrounds represents the signal normalized to the best-fit B(Z → τ).The overlaid dashed line represents the expected distribution for the signal normalized to B(Z → τ) = 10 −3 .In the panels below each plot, the ratios of the observed data (dots) and the post-fit background plus signal (solid line) to the post-fit background are shown.The hatched error bands represent the combined statistical and systematic uncertainties.The first and last bins include underflow and overflow events, respectively.

Figure 5 :
Figure 5: Observed and expected post-fit distributions of unscaled NN inputs in SR for the eτ channel with 1-prong τ had-vis candidates.The fit is based on profiling on the combined NN classifier, but not directly on these variables.The filled histogram stacked on top of the backgrounds represents the signal normalized to the best-fit B(Z → τ).The overlaid dashed line represents the expected distribution for the signal normalized to B(Z → τ) = 10 −3 .In the panels below each plot, the ratios of the observed data (dots) and the post-fit background plus signal (solid line) to the post-fit background are shown.The hatched error bands represent the combined statistical and systematic uncertainties.The first and last bins include underflow and overflow events, respectively.

Figure 6 :
Figure 6: Observed and expected post-fit distributions of unscaled NN inputs in SR for the eτ channel with 3-prong τ had-vis candidates.The fit is based on profiling on the combined NN classifier, but not directly on these variables.The filled histogram stacked on top of the backgrounds represents the signal normalized to the best-fit B(Z → τ).The overlaid dashed line represents the expected distribution for the signal normalized to B(Z → τ) = 10 −3 .In the panels below each plot, the ratios of the observed data (dots) and the post-fit background plus signal (solid line) to the post-fit background are shown.The hatched error bands represent the combined statistical and systematic uncertainties.The first and last bins include underflow and overflow events, respectively.

Figure 7 :
Figure 7: Observed and expected post-fit distributions of unscaled NN inputs in SR for the µτ channel with 1-prong τ had-vis candidates.The fit is based on profiling on the combined NN classifier, but not directly on these variables.The filled histogram stacked on top of the backgrounds represents the signal normalized to the best-fit B(Z → τ).The overlaid dashed line represents the expected distribution for the signal normalized to B(Z → τ) = 10 −3 .In the panels below each plot, the ratios of the observed data (dots) and the post-fit background plus signal (solid line) to the post-fit background are shown.The hatched error bands represent the combined statistical and systematic uncertainties.The first and last bins include underflow and overflow events, respectively.

Figure 8 :
Figure 8: Observed and expected post-fit distributions of unscaled NN inputs in SR for the µτ channel with 3-prong τ had-vis candidates.The fit is based on profiling on the combined NN classifier, but not directly on these variables.The filled histogram stacked on top of the backgrounds represents the signal normalized to the best-fit B(Z → τ).The overlaid dashed line represents the expected distribution for the signal normalized to B(Z → τ) = 10 −3 .In the panels below each plot, the ratios of the observed data (dots) and the post-fit background plus signal (solid line) to the post-fit background are shown.The hatched error bands represent the combined statistical and systematic uncertainties.The first and last bins include underflow and overflow events, respectively.

Table 1 :
). Overview of the event selection.More details are given in Sections 2 and 4.

Table 2 :
Input variables for the NN classifiers.The first six quantities are in the boosted and rotated frame described in the text; the last four are in the laboratory frame.

Table 3 :
Calibration regions used to derive fake factors.Differences from the SR selection (Table1) are listed together with the purities for the target processes as expected from simulation.

Table 6 :
Best-fit values for B(Z → τ) and the other free parameters, and exclusion upper limits in the eτ and µτ channels.The uncertainties include both the statistical and systematic contributions.
[28] ATLAS Collaboration, ATLAS Pythia 8 tunes to 7 TeV data, ATL-PHYS-PUB-2014-021, 2014, : https://cds.cern.ch/record/1966419.United States of America.b Also at Budker Institute of Nuclear Physics, SB RAS, Novosibirsk; Russia.c Also at Centre for High Performance Computing, CSIR Campus, Rosebank, Cape Town; South Africa.d Also at CERN, Geneva; Switzerland.e Also at CPPM, Aix-Marseille Université and CNRS/IN2P3, Marseille; France.f Also at Departament de Fisica de la Universitat Autonoma de Barcelona, Barcelona; Spain.g Also at Departamento de Fisica Teorica y del Cosmos, Universidad de Granada, Granada (Spain); Spain.h Also at Departement de Physique Nucléaire et Corpusculaire, Université de Genève, Geneva; Switzerland.i j q Also at Department of Physics, The University of Michigan, Ann Arbor MI; United States of America.r Also at Department of Physics, The University of Texas at Austin, Austin TX; United States of America.