Search for lepton-flavor violation in different-flavor, high-mass final states in $pp$ collisions at $\sqrt{s}=13$ TeV with the ATLAS detector

A search is performed for a heavy particle decaying into different-flavor, dilepton pairs ($e\mu$, $e\tau$ or $\mu\tau$), using 36.1 fb$^{-1}$ of proton-proton collision data at $\sqrt{s}=13$ TeV collected in 2015-2016 by the ATLAS detector at the Large Hadron Collider. No excesses over the Standard Model predictions are observed. Bayesian lower limits at the 95% credibility level are placed on the mass of a $Z^{\prime}$ boson, the mass of a supersymmetric $\tau$-sneutrino, and on the threshold mass for quantum black-hole production. For the $Z^{\prime}$ and sneutrino models, upper cross-section limits are converted to upper limits on couplings, which are compared with similar limits from low-energy experiments and which are more stringent for the $e\tau$ and $\mu\tau$ modes


Introduction
Direct charged-lepton flavor violation (LFV) is forbidden in the Standard Model (SM) of particle physics but is allowed in many extensions of the SM. Many such models predict new particles with LFV decays, such as Z bosons [1], scalar neutrinos in R-parity-violating (RPV) [2,3] supersymmetry (SUSY) and quantum black holes (QBH) in low-scale gravity [4]. Processes with flavor-violating dilepton decays are expected to produce pairs of prompt, different-flavor leptons, a final state with a clear experimental signature and a low background from SM processes. The Z/γ * → process is an irreducible background for same-flavor lepton searches but in different-flavor searches is limited to the production and decay of a ττ pair. This paper describes a search for new phenomena in final states with two leptons of different flavor using 36.1 fb −1 of data from proton-proton (pp) collisions at √ s = 13 TeV at the Large Hadron Collider (LHC).
A common extension of the SM is the addition of an extra U(1) gauge symmetry resulting in a massive neutral vector boson known as a Z boson [1]. The search presented in this paper assumes a Z boson with the same quark couplings and chiral structure as the SM Z boson but only leptonic decays that violate Lepton Flavour Conservation are allowed. The parameter Q i j , where i, j = 1 . . . 3 represent the three lepton generations, gives the strength of the LFV couplings relative to the SM couplings. The ATLAS Collaboration placed lower limits of 3.3, 2.9, and 2.7 TeV on the mass of a Z boson decaying into eµ, eτ, and µτ pairs, respectively, using 3.2 fb −1 of the 13 TeV data [5], while the CMS Collaboration has placed limits up to 4.4 TeV on a Z boson decaying into an eµ final state using 35.9 fb −1 [6]. Following the same methodology as in Ref. [5], this paper assumes only one LFV coupling is different from zero at any time for the purpose of setting limits on the cross-section times branching of each final state considered. Polarization of τ-leptons is not included in the model, but its impact on the τ-lepton acceptance is found to be negligible.
In RPV SUSY, the superpotential terms allowing LFV are expressed as 1 2 λ i jk L i L j E c k + λ i jk L i Q j D c k , where L and Q are the SU(2) doublet superfields of leptons and quarks, E and D are the SU(2) singlet superfields of charged leptons and down-type quarks, λ and λ are Yukawa couplings, and the indices i, j, and k denote generations. A τ-sneutrino (ν τ ) may be produced in pp collisions by dd annihilation and subsequently decay into eµ, eτ, or µτ. Although onlyν τ is considered in this paper, results apply to any sneutrino flavor. For the theoretical prediction of the cross-section times branching ratio, theν τ coupling to first-generation quarks (λ 311 ) is assumed to be 0.11 for all channels. As in the Z model, each lepton-flavor-violating final state is considered separately. It is assumed that λ 312 = λ 321 = 0.07 for the eµ final state, λ 313 = λ 331 = 0.07 for the eτ final state, and λ 323 = λ 332 = 0.07 for the µτ final state. These values are consistent with benchmark models used in previous ATLAS and CMS searches [5,7,8]. The CMS Collaboration has recently excluded R-parity-violating supersymmetric models below 1.7 TeV for λ 132 = λ 231 = λ 311 = 0.01 [6]. Various models introduce extra spatial dimensions to reduce the value of the Planck mass and resolve the hierarchy problem. The search described in this paper presents interpretations based on the Arkani-Hamed-Dimopoulos-Dvali (ADD) model [9], assuming n = 6, where n is the number of extra dimensions, and on the Randall-Sundrum (RS) model [10] with one extra dimension. Due to the increased strength of gravity at short distances in these models, pp collisions at the LHC could produce states exceeding the threshold mass (m th ) and form black holes. For the models considered, m th is assumed to be equivalent to the extra-dimensional Planck scale. For masses beyond 3-5m th , it is expected that thermal black holes would be produced [11,12], characterized by high-multiplicity final states. The search presented in this paper focuses on the mass region below 3-5 m th , known as the quantum gravity regime [13][14][15], where production of non-thermal (or quantum) black holes is expected and these black holes could decay into two-particle final states, producing the topology investigated in this paper. Non-thermal quantum black holes would have a continuum mass distribution from m th up to the beginning of the thermal regime. For the models considered in this paper, the thermal regime is assumed to start at 3m th . The decay of quantum black holes would be governed by a yet unknown theory of quantum gravity. The two main assumptions of the extra-dimensions models considered [4] in this paper are (a) gravity couples with equal strength to all SM particle degrees of freedom and (b) gravity conserves local symmetries (color and electric charge) but can violate global symmetries such as lepton-flavor and baryon-number conservation. Following these assumptions, the branching ratio to each final state is calculated. QBHs decaying into different-flavor, opposite-charge lepton pairs are created via qq or gg annihilation. The branching ratio to is 0.87% (0.34%) for a qq (gg) initial state [4]. These models were used in previous ATLAS and CMS searches for quantum black holes in dijet [16][17][18], lepton+jet [19], photon+jet [20], eµ [6], and same-flavor dilepton [21] final states.

The ATLAS detector
The ATLAS detector [22] is a general-purpose particle detector with approximately forward-backward symmetric cylindrical geometry.1 It is composed of four main components, each responsible for identifying and reconstructing different types of particles: the inner detector (ID), the electromagnetic and hadronic calorimeters, and the muon spectrometer (MS). Each of the subdetectors is divided into two components, barrel and endcap, to provide coverage close to 4π in solid angle. In addition, two magnet systems allow charge and momentum measurements: an axial magnetic field of 2.0 T provided by a solenoid surrounding the ID and a toroidal magnetic field for the MS. The ID, the closest component to the interaction point, is used to reconstruct the trajectories of charged particles in the region |η| < 2.5 and measure their momenta. It is composed of three subsystems: (1) the silicon pixel detector, including an additional inner layer at a radius of 3.2 cm added in 2015 [23,24], (2) the semiconductor tracker, used in conjunction with the silicon pixel detector to determine primary and secondary vertices with high precision thanks to their high granularity, (3) the transition radiation tracker, providing additional tracking in the region |η| < 2.0 and electron identification.
The calorimeter system covers the pseudorapidity range |η| < 4.9. Within the region |η| < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters, with an additional thin LAr presampler covering |η| < 1.8, to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillatingtile calorimeter, segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadronic endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements respectively. Surrounding the calorimeter system, the MS is the subdetector furthest from the interaction point. It consists of three layers of precision tracking chambers and fast detectors for triggering on muons. Tracking coverage is provided for |η| < 2.7 by three layers of precision drift tube chambers, with cathode strip 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the center of the LHC ring, and the y-axis points upward. The x-y plane is referred to as the transverse plane, used to define quantities such as the transverse momentum (p T ). Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Angular distance is measured as ∆R = (∆η) 2 + (∆φ) 2 . chambers in the innermost layer for |η| > 2.0, while trigger coverage is provided by resistive plate and thin gap chambers for |η| < 2.4.
The trigger and data-acquisition system is based on two levels of online event selection [25]: the level-1 trigger and the high-level trigger. The level-1 trigger is hardware-based and uses a subset of detector information to provide quick trigger decisions and reduce the accepted rate to 100 kHz. The high-level trigger is software-based and exploits the full detector information to further reduce the acceptance rate to about one kHz.

Data and simulated samples
The data sample used for this analysis was collected during 2015 and 2016 from pp collisions at a center-of-mass energy of 13 TeV. After selecting periods with stable beams and applying data-quality requirements, the total integrated luminosity is 36.1 fb −1 with an uncertainty of 2.1%, derived following a methodology similar to that detailed in Ref.
[26] from a calibration of the luminosity scale using x-y beam-separation scans.
The pp → Z → samples were generated at leading order (LO) using the generator P 8.186 [27] with the NNPDF23LO [28] parton distribution function (PDF) set and the A14 [29] set of tuned parameters. Signal samples for 25 mass points ranging from 0.5 TeV to 5 TeV were generated in 0.1 TeV steps from 0.5 to 2 TeV, 0.2 TeV steps from 2 to 3 TeV, and 0.5 TeV steps from 3 to 5 TeV. The production cross-section was calculated with the same generator used for simulation. The cross-section and signal shape in the dilepton invariant mass distribution were corrected from LO to next-to-next-to-leading order (NNLO) in the strong coupling constant with a rescaling that depends on the dilepton invariant mass and which was computed with VRAP 0.9 [30] and the CT14NNLO PDF [31] set. This correction is applied as a multiplicative factor of about 0.98 at a dilepton invariant mass m of 3 TeV. No mixing of the Z boson with the Z and γ * bosons is included.
The dd →ν τ → samples were generated at LO with M G 5_ MC@NLO v2.3.3 [32] interfaced to the P 8.186 parton shower model with the NNPDF23LO PDF set and the A14 tune. The signal samples were generated at the same masses as for the Z model described above. The cross-section was calculated at LO with the same generator used for simulation and corrected to next-to-leading order (NLO) using L T v2.2 [33].
The pp → QBH → samples were generated with the program QBH 3.00 [34] using the CTEQ6L1 [35] PDF set and the A14 tune, for which P 8.183 provides showering and hadronization. For each extra-dimensional model, eleven m th points in 0.5 TeV steps were produced: from 3 to 8 TeV for the ADD n = 6 model and from 1 to 6 TeV for the RS n = 1 model. The production cross-section was calculated with the same generator used for simulation. These two models differ in the number and nature of the additional extra dimensions (large extra dimensions for ADD and one highly warped extra dimension for RS). In particular, the ADD model produces black holes with a larger gravitational radius and hence the parton-parton cross-section for this model is larger than for the RS model. Therefore, the m th range of the generated samples differs for the two models.
The SM background in the LFV dilepton search is due to several processes which produce a final state with two different-flavor leptons. For the eµ mode, the dominant background contributions originate from tt and single-top production, with the subsequent decays of the top quark producing leptonically decaying W bosons. Other backgrounds originate from diboson (WW, W Z, and Z Z) production, and τ-lepton pair production qq → Z/γ * → ττ, which both produce different-flavor final states, through the leptonic decays of the W and Z bosons or the τ-leptons. They contribute about 15% and 1% of the background, respectively. Multijet and W+jets processes contribute due to the misidentification of jets as leptons and are the dominant background for the final states with a τ-lepton.
Backgrounds from top-quark production include tt and single-top with an associated W boson (tW). Both were generated at NLO using the P -B [36][37][38]  , preserving all spin correlations. The h damp parameter, which controls the p T of the first emission beyond the Born configuration in P -B , was set to the mass of the top quark. The main effect of this parameter is to regulate the high-p T emission against which the tt system recoils. The mass was set to the top quark mass of 172.5 GeV. The EvtGen 1.2.0 program [43] was used for the properties of band c-hadron decays. A value of 831 +20 −29 (scale) +35 −35 (PDF+α S ) +23 −22 (mass uncertainty) pb is used for the tt production cross-section, computed with T ++ 2.0 [44], incorporating NNLO QCD corrections, including resummation of next-to-next-to-leading logarithmic (NNLL) soft gluon terms. A Wt production cross-section of 71.7 ± 3.8 pb is used, as computed in Ref.
Diboson processes producing at least two charged leptons were simulated using the S 2.2.2 generator [46]. The matrix elements contain all diagrams with four electroweak vertices. Fully leptonic decays were calculated for up to one parton (four leptons, or two leptons and two neutrinos) or zero partons (three leptons and one neutrino) at NLO and up to three partons at LO using the C [47] and O L [48] matrix-element generators and merged with the S parton shower [49] using the ME+PS@NLO prescription [50]. The CT10 PDF set was used in conjunction with the default parton shower tuning provided by the authors of S . Inclusive cross-section values of 1.28, 4.51, and 10.64 pb are used for Z Z, W Z, and WW production, respectively.
Events containing W or Z bosons are generated using P -B v2 interfaced to the P 8.186 parton shower model. The CT10 PDF set is used in the matrix element. The AZNLO set of tuned parameters [51] is used, with PDF set CTEQ6L1, for the modelling of non-perturbative effects. The EvtGen 1.2.0 program is used for the properties of b-and c-hadron decays. Photos++ 3.52 [52] is used for QED emissions from electroweak vertices and charged leptons. The W/Z samples are normalised with the NNLO cross sections. This background contribution is normalized to an inclusive cross-section of 1.9 nb, calculated for m >60 GeV.
Processes such as W+jets and multijet production with jets that are misidentified as leptons were estimated through a combination of data-driven methods and simulation, detailed in Section 5. The W+jets contribution was estimated with the aid of the S 2.2.2 simulated samples. Matrix elements were calculated for up to two partons at NLO and four partons at LO using Comix and OpenLoops and merged with the S parton shower [49] according to the ME+PS@NLO prescription [50]. The overall cross-section times branching ratio for the W ± → ± ν+jets events is taken to be 59.6 nb.
For all samples used in this analysis, the effects of multiple proton-proton interactions per bunch crossing (pileup) were included by overlaying minimum-bias events simulated with P 8.186 using the ATLAS A14 set of tuned parameters [53] and reweighting the simulated events to reproduce the distribution of the number of interactions per bunch crossing observed in the data. The generated events were processed with the ATLAS simulation infrastructure [54], based on G 4 [55], and passed through the trigger simulation and the same reconstruction software used for the data.

Event reconstruction and selection
This search is optimized to look for new phenomena in the high mass range. Events are selected if they satisfy a single-muon or single-electron trigger with a p T threshold of 50 GeV for muons and 60 or 120 GeV for electrons. The single-electron trigger with higher p T threshold has a looser identification requirement, resulting in an increased trigger efficiency at high p T . Electron candidates are formed by associating the energy in clusters of cells in the electromagnetic calorimeter with a track in the ID [56]. A likelihood discriminant suppresses contributions from hadronic jets, photon conversions, Dalitz decays, and semileptonic heavy-flavor hadron. The likelihood discriminant utilizes lateral and longitudinal calorimeter shower shapes plus tracking and cluster-track matching quantities. The discriminant criterion is a function of the p T and |η| of the electron candidate. Two operating points are used in this analysis, as defined in Ref. [57]: Medium and Tight. The Tight working point (85% efficient at p T = 65 GeV determined with Z → ee events) is required for electron candidates. Electron candidates must have p T > 65 GeV and |η| < 2.47, excluding the region 1.37 < |η| < 1.52, where the energy reconstruction performance is degraded due to the presence of extra inactive material. Further requirements are made on the transverse and longitudinal impact parameters of the track, which is the distance between the z-position of the point of closest approach of the track in the ID to the beamline and the z-coordinate of the primary vertex relative to the primary vertex of the event (d 0 and ∆z 0 ). The requirements are the following: |d 0 /σ d 0 | < 5 and |∆z 0 sin θ| < 0.5 mm. Candidates are required to satisfy relative track-based (as defined above for muon candidates) and calorimeter-based isolation requirements with an efficiency of 99% to suppress background from non-prompt electrons originating from heavyflavor semileptonic decays, charged hadrons, and photon conversions from π 0 decays. The sum of the calorimeter transverse energy deposits (excluding the electron itself) in an isolation cone of size ∆R = 0.2 divided by the electron p T is the discriminant used in the calorimeter-based isolation criterion. For the reducible background estimation, electron candidates passing the Medium working point (95% efficient at p T = 65 GeV determined with Z/γ * → ee events) are referred to as "loose electrons".
Candidate muon tracks are initially reconstructed independently in the ID and the MS. The two tracks are input to a combined fit which takes into account the energy loss in the calorimeter and multiple scattering. Muon identification is based on information from both the ID and MS to ensure that muons are reconstructed with the optimal momentum resolution up to very high p T using the High-p T operating point [58]. Only tracks with hits in each of the three stations of the muon spectrometer are considered. This provides a muon transverse momentum resolution of about 10% at 1 TeV. Moreover, muon candidates are required to be within the ID acceptance region2 of |η| < 2.5, fulfill |d 0 /σ d 0 | < 3 and |∆z 0 sin θ| < 0.5 mm, have a transverse momentum larger than 65 GeV, and fulfill a track-based isolation criterion with an efficiency of 99% over the full range of muon momenta to further reduce contamination from non-prompt muons. The scalar sum of the transverse momenta of tracks (excluding the muon itself) in an isolation cone of size ∆R = 0.2 divided by the muon p T is the discriminant used in the track-based isolation criterion. For the reducible background estimation, muon candidates fulfilling all selection criteria except the isolation criterion are called "loose muons".
Jets are reconstructed using the anti-k t algorithm [59] with a radius parameter of 0.4 using energy clusters [60] of calorimeter cells as input. Jet calibrations [61] derived from √ s = 13 TeV simulated data and from collision data taken at 13 TeV are used to correct the jet energies and directions to those of the particles from the hard-scatter interaction.
Hadronic decays of τ-leptons are composed of a neutrino and a set of visible decay products (τ had-vis ), typically one or three charged pions and up to two neutral pions. The reconstruction of τ-leptons and their visible hadronic decay products starts with jets reconstructed from topological clusters [62]. The τ had-vis candidates must have |η| < 2.5 with the transition region between the barrel and endcap calorimeters (1.37 < |η| < 1.52) excluded, a transverse momentum greater than 65 GeV, one or three associated tracks with ±1 total electric charge. Their identification is performed using a multivariate algorithm that employs boosted decision trees (BDT) using shower shape and tracking information to discriminate against jets. All τ had-vis candidates are required to fulfill the "loose" identification requirements of Ref. Only events with exactly two different-flavour leptons are chosen. As such, there is no overlap between the three channels considered: eµ, eτ, µτ. They must have a reconstructed primary vertex, defined as the vertex whose constituent tracks have the highest sum of p 2 T , and exactly two reconstructed different-flavor lepton candidates meeting the above-mentioned criteria. Events with an additional electron, muon or τ had-vis are vetoed. For the eµ channel only, events with an extra electron or muon fulfilling the "loose" criteria are also vetoed, including events used for the purpose of the reducible background estimation. For all three channels, the lepton candidates must be back-to-back in the transverse plane with ∆φ( , ) > 2.7. The invariant mass of the dilepton pair is used as the discriminant. No requirement is made on the respective charges of the leptons since it reduces the signal efficiency by as much as 6% for the highestmass signals considered due to charge misassignment without a significant effect on the background rejection. To account for differences between data and simulation, corrections are applied to the lepton trigger, reconstruction, identification, and isolation efficiencies as well as the lepton energy/momentum resolution and scale [56][57][58]63].
Double-counting of leptons and jets is avoided by applying an overlap removal algorithm based on the ∆R distance metric. First, jets within ∆R < 0.2 of any identified muons or electrons are removed. Then, any muons and electrons within 0.2 < ∆R < 0.4 from the jet axis are removed.
The missing transverse momentum vector ( ì E miss T ) is defined as the negative vector sum of the transverse momenta of all identified physics objects and an additional soft term. The soft term is constructed from all tracks that are associated with the primary vertex, but not with any selected physics object. In this way, the missing transverse momentum incorporates the best calibration of the jets and the other identified physics objects, while maintaining pileup independence in the soft term [65].
The dominant background for the eµ channel is tt production, which can be suppressed by rejecting events that contain one or more b-jets (b-veto).
For a Z boson with a mass of 1.5 TeV, the fractions of events that pass all of the selection requirements are approximately 45%, 45%, 20%, and 15% for the eµ, eµ with b-veto, eτ, and µτ final states, respectively.
For the reducible background estimation in the eτ and µτ channels, the transverse mass m T of a lepton and the missing transverse momentum is defined as where p T is the transverse momentum of the lepton, E miss T is the magnitude of the missing transverse momentum vector, and ∆φ( , E miss T ) is the azimuthal angle between the lepton and ì E miss T directions.
For the dilepton mass calculation in the eτ and µτ channels, the missing momentum from the neutrino in the hadronic decay of a τ-lepton is estimated and added to the four-momentum of the τ had-vis candidate. At the considered momenta, the hadronic decay of the τ-lepton results in the neutrino and the resultant jet being nearly collinear. The neutrino four-momentum is reconstructed from the magnitude of the missing transverse momentum and the direction of the τ had-vis candidate. This technique significantly improves the dilepton mass resolution and search sensitivity [7]. For a simulated Z boson with a mass of 2 TeV, the mass resolution improves from 8% (17%) to 4% (12%) in the eτ (µτ) channel.

Background estimation
The background processes for this search can be divided into two categories: reducible and irreducible. The latter is composed of processes which produce two different-flavor prompt leptons, including Z/γ * → ττ, tt, single-top, and diboson production. These processes are modeled using simulated samples and normalized to their theoretically predicted cross-sections. Reducible backgrounds originate from jets misreconstructed as leptons and require the use of data-driven techniques. The contribution from reducible backgrounds is small in the eµ channel, about 5%, whereas in the eτ and µτ channels they are the leading components and make up 50−60% of the total background.

Top-quark background extrapolation
The simulated samples used to estimate single-top-quark and tt production are statistically limited for dilepton invariant masses above 1 TeV. Therefore, for m >700 GeV, the tt plus single-top contributions are evaluated using monotonically decreasing functions fitted to the m distribution. Two functional forms are chosen for their stability when varying the fit range and for the quality of the fit: where a, b and c are free parameters in the fit. To account for fit variations, the lower and upper limits of the fit range were varied in 25 GeV steps between 200−300 GeV and 1000−1200 (800−1000) GeV for eµ (eτ and µτ). The nominal extrapolation is taken as the average of all the tested fit ranges using both functional forms. The extrapolation is found to agree within statistical uncertainties with the simulated data. For each mass bin, the up and down variations obtained from varying the fit parameters are combined in quadrature with the uncertainty of the fit range variation. This uncertainty is 32% at 2 TeV for the eµ channel.

Reducible background
The main reducible backgrounds are W+jets and multi-jet production. The contribution to the reducible background from muons originating from decays of hadrons in jets is found to be negligible compared with the contribution from fake electrons and τ-leptons. Therefore, in the eµ channel, where reducible backgrounds are a small contribution to the total, non-prompt muons are neglected and only events with one prompt muon and a jet faking an electron are considered. In channels involving taus, however, reducible backgrounds are more significant, and so contributions from both electrons and muons, primarily non-prompt leptons from heavy flavor decays, are taken into account. However, the dominant source of reducible background in these channels remains fake taus from gluon-initiated jets.

eµ channel: matrix method
For the eµ channel, the matrix method is employed, as detailed in Ref. [21]. The selection criteria are loosened for electron candidates to create a sample of events with a muon and a "loose" electron as defined in Section 4. These events are referred to as "loose", while those in which both the electron and muon pass all selections are "tight". The probability of a "loose" electron matched to a generated electron to pass the full object selection (the "real efficiency") is evaluated from Z → ee simulated events, while the probability that a jet is misidentified as an electron (the "fake rate") is obtained in a multijet-enriched data sample. To suppress the W+jets contribution to the multijet-enriched sample, its events are required to have E miss T < 25 GeV and m T < 50 GeV as well as to pass the signal region selection outlined in Section 4. Both the real efficiency and the fake rate are determined as a function of p T , and they are used to estimate the reducible background contribution by solving a system of linear equations involving the numbers of loose and tight events in the signal region. Residual contaminations from W+jets and other SM background processes (top, diboson, and Z → ) in the multi-jet CR are subtracted using simulation. This background is estimated up to around 1.5 TeV, where there are no data. However, at this stage, the expectation of this background is well below one event, and generally negligible compared to diboson and top quark processes.
The uncertainties associated with the matrix method are evaluated by considering systematic effects on the electron fake rate. Uncertainties of the real electron efficiency have a negligible impact on the estimation and are not considered. The systematic uncertainties in the fake rate include • the choice of multijet-enriched region, • uncertainties on the MC subtraction in the multijet-enriched region, and • the difference in the fake rates obtained using this method and those obtained from simulated W+jets events.
The overall uncertainty of the eµ reducible background is about 30%. Given that in the eµ channel this contribution is about 7% of the total background over the invariant mass range considered, the uncertainties in the estimation method have a small impact on the results.

eτ and µτ channels: W +jets estimate
The dominant background for the eτ and µτ channels is the W+jets process, where a jet is misidentified as a τ had-vis candidate. It is estimated using simulated events with each jet weighted by its probability to pass the τ-lepton identification as measured in data. This not only ensures the correct fake rate but also improves the statistical precision of the estimate, since events failing the τ-lepton identification requirements are not discarded. The τ had-vis fake rate is measured in a W→ e/µ+jets control region as a function of the p T , η, and number of tracks of the τ had-vis candidates. The W+jets-enriched control region uses the same selection as the signal region, but reverses the back-to-back criterion to ∆φ( , ) < 2.7, and uses τ-leptons fulfilling all requirements except identification, although a minimum requirement on the BDT discriminant is retained. Only events with exactly one electron or muon fulfilling all selection criteria, as well as m T > 80 GeV to enrich the W+jets contribution, are used. Events where the invariant mass of the e or µ and the τ had-vis candidate is between 80 and 110 GeV are vetoed to reduce contamination from Z boson decays. Contributions from non-W+jets processes are subtracted using simulation. The τ had-vis candidates present in the remaining events are dominated by jets. The contribution of events with non-prompt electrons is estimated from simulation and found to be less than 1%. The τ had-vis fake rate is defined as the fraction of τ had-vis candidates in the sample that also pass the τ had-vis identification. This rate is used to weight simulated W+jets events. The resulting distribution obtained for the W+jets is validated in the W+jets-enriched control region, where good agreement is found between data and the expected SM background processes.
The uncertainties in the τ had-vis fake rate are evaluated from • the modeling of the "loose" τ had-vis identification requirement in simulation, • the statistical uncertainty of the data-driven estimation of the τ-lepton fake rate, and • the differences in τ-lepton fake rate between signal and control regions.
These errors are detailed in the following paragraphs.
The τ had-vis fake rate is re-evaluated when removing the m T > 80 GeV requirement to check the contamination from non-W+jets processes. The effect on the fake rate and the final estimation of the W+jets background is about 2%.
The statistical uncertainty of the fake rate in the control regions is propagated through the estimate. The impact is small at low m τ but is the leading uncertainty of the fake rate in the range m τ > 1 TeV.
The jet composition of the fake τ had-vis background is evaluated from simulated W+jets events. The control region where the τ had-vis fake rate is evaluated should have a a jet composition similar to that in the signal region. Therefore, W+jets simulated events are used to investigate the difference between the fake rates measured in the W+jets control and signal regions. The comparison reveals a slightly lower fake rate in the signal region, consistent with the lower expected gluon contribution. The relative difference between these fake rates is assigned as a systematic uncertainty, which contributes an uncertainty of about 8% to the total background at m τ = 1 TeV.

eτ and µτ channels: multijet estimate
The multijet background contributions in the eτ and µτ channels are evaluated using events in three control regions (R 1 , R 2 , R 3 ). The events must pass the selection for the signal region, except that in R 1 and R 3 the electron/muon must fail isolation and the τ had-vis candidate must fail identification, and in R 1 and R 2 the leptons must have the same electric charge and the electron/muon p T must be less than 200 GeVto avoid signal contamination. For a Z boson with a mass of 500 GeV, the lowest signal mass considered in this paper, the contamination from the signal processs in R 2 is found to be below 1%. The region definitions are summarized in Table 1. The background contribution is estimated as The transfer factor N R 2 /N R 1 is calculated as a function of the dilepton mass to encapsulate correlations between m τ and the isolation and identification requirements, and it is fitted with a polynomial function. In each of the regions defined, the contributions from other SM processes, such as W+jets, Z+jets, Z/γ * → , diboson, and top-quark production, are subtracted using simulation. The contribution from the multijet background is ∼60% (∼20%) of the W+jets background for the eτ (µτ) channel, corresponding to ∼25% (∼10%) of the total expected background. Object selection Lepton-pair charges The multijet background is estimated using a transfer factor obtained using same-charge lepton pairs and applied to opposite-charge plus same-charge lepton pairs. To check the validity of this procedure, the multijet background is also estimated using a transfer factor obtained with opposite-charge pairs. The difference between the resulting shape and transfer factors is assigned as a systematic uncertainty. The impact of this uncertainty is about 7% at 1 TeV.
The statistical uncertainties in the m τ -dependent transfer factor and the subtraction of simulated events are propagated to the final estimate and assigned as a systematic uncertainty. The overall uncertainty is 50% (15%) at 1 TeV for the eτ (µτ) channel. The uncertainty in the µτ channel is smaller because the transfer factor is found to have a negligible effect on the dilepton invariant mass, and the transfer-factor fit uncertainties are reduced.

Reducible background validation
The validity of the background estimation is checked in the W+jets control region. Figures 1 and 2 show the electron, muon, and τ-lepton transverse momentum, τ invariant mass and jet multiplicity distributions for the eτ and µτ channels, respectively, in the W+jets control region. Good agreement is observed between the data and the background prediction. The contribution from each SM background for each of the final states in the signal region is given in Section 7.  Mismodeling of the muon momentum resolution at the TeV scale from residual misalignment of the muon precision chambers can alter the signal and background shapes. A corresponding uncertainty is obtained from studies performed in dedicated data-taking periods with no magnetic field in the MS. The muon reconstruction efficiency is affected at high p T by possible large energy losses in the calorimeter. The associated uncertainty is estimated by comparing studies of Z → µµ data events extrapolated to high p T with the results predicted by simulation [67]. The effect on the muon reconstruction efficiency was found to be approximately 3% per TeV as a function of muon p T .
The uncertainty of the electron identification efficiency extrapolation is determined from the differences in the electron shower shapes in the EM calorimeters between data and simulation in the Z → ee peak, which are propagated to the high p T electron sample. The effect on the electron identification efficiency is 2% and is independent of p T for electrons with transverse momentum above 150 GeV [67].
The treatment of systematic uncertainties for τ-leptons with p T up to 100 GeV is detailed in Ref.
[62]. An additional uncertainty of 20% per TeV is assigned to the reconstruction efficiency of τ-leptons with p T > 100 GeV to account for the degradation of the modeling and reconstruction efficiency from track merging, derived from studies in simulation and in dijet data events at 8 TeV [68].
The missing transverse momentum uncertainty is derived from the uncertainties of the momenta of physics objects and uncertainties of the soft term determined by comparisons with simulation.
A mis-modeling of the dilepton p T variable is found in the tt simulation. After reweighting to data in a tt control region, an uncertainty is assigned to account for the effect on the dilepton invariant mass spectrum.
A pile-up modeling uncertainty is estimated by varying the distribution of pile-up events in the reweighting of the MC, to cover the uncertainty on the ratio between the predicted and measured inelastic cross-section in the fiducial volume defined by M X > 13 GeV where M X is the mass of the hadronic system [69].
The uncertainty of 2.1% in the luminosity applies to the signal and to backgrounds derived from simulations.
The uncertainties of the reducible background estimation in the eµ channel, and the τ-lepton fake rate, the multijet transfer factor calculation, and the top-quark extrapolation are presented in Section 5.
The PDF uncertainties are the dominant systematic uncertainties affecting the background estimates, together with the uncertainty on the extrapolation to estimate the top-quark background contribution at high mass. The contribution from PDF uncertainties is estimated using different PDF sets and eigenvector variations within a particular PDF set for the top-quark, diboson, and W+jets backgrounds. The CT10 PDF uncertainty due to eigenvector variations is evaluated through the use of LHAPDF [70] following the prescriptions of Ref. [71]. The uncertainty related to the choice of PDF is evaluated by comparing the results with those from the central value of other PDF sets: MMHT2014 [72], NNPDF3.0 [73], and CT14 [31]. PDF-related uncertainties in the signal shape are not considered. The uncertainties of the m modeling in tt events are obtained using separate simulated samples generated with the renormalization scale and h damp parameter varied by factors of 2 and 1/2, and are referred to as"Top scale" in Table 2. These uncertainties for W+jets are not considered as they are found to be small, given that this background is mainly composed of real lepton (e or mu) and fake tau pairs. For the diboson background prediction, the PDF systematic is the leading uncertainty.
Experimental systematic uncertainties common to signal and background processes are assumed to be correlated. The systematic uncertainties of the estimated SM background and signal yields are summarized in Tables 2 and 3. For signal processes, only experimental systematic uncertainties are considered. The simulated samples contribute a 3% statistical uncertainty to the overall signal acceptance times efficiency.

Results
Tables 4-7 show the expected and observed numbers of events in the low and high mass regions for each channel. The eµ background is dominated by tt and diboson events, while W+jets events are dominant for the eτ and µτ final states. Figure 3 shows the dilepton invariant mass distributions for the eµ, eµ with b-veto, eτ, and µτ channels. The largest deviation found in the data is a deficit in the 1.1-1.4 TeV range of the eµ channel, with a global significance of 1.8 standard deviations, obtained using the BumpHunter program [74]. Due to the parton luminosity tail in the LFV Z model, the impact of the deficit in the 1.1-1.4 TeV range is also seen in the observed limit for Z boson masses up to 4-5 TeV. No significant excess is found in any channel.
Since no deviations from the SM prediction are observed, model-dependent exclusion limits are extracted using a Bayesian method implemented with the Bayesian analysis toolkit [75]. A binned likelihood function is constructed from the product of the Poisson probabilities of the observed and expected numbers of events in each m mass bin as in Ref. [5]. A 95% credibility level (CL) Bayesian upper limit is placed on the signal cross-section times branching ratio.
Expected exclusion limits are obtained by generating 1000 pseudo-experiments for each signal mass point. The median value of the pseudo-experiment distribution of the 95% CL Bayesian upper limit is taken as the expected limit. The one-and two-standard deviation intervals of the expected limit are obtained by finding the 68% and 95% intervals of the pseudo-experiment upper limit distribution, respectively.
The invariant mass spectrum for each final state is analyzed in 60 bins from 120 GeV to 10 TeV. The bin width is around 7% of the dilepton mass throughout the whole range. The predicted width of the Z boson, 3% for m Z = 2 TeV, is smaller than the detector resolutions for the eµ and the µτ channels, which are approximately 8% and 12%, respectively, at the same Z boson mass. For the eτ final state the detector  resolution is 4% at m Z = 2 TeV, comparable to the Z boson width. The width of theν τ is below 1%, and hence the resolution of the detector is larger than the width for each of the final states investigated. Figures 4-6 show the observed and expected 95% CL upper limits on the production cross-section times branching ratio of the Z , RPV SUSYν τ and QBH models for each of the final states considered. The extracted limits are not as strong for signal masses above about 2.5 TeV due to a decrease in acceptance at very high p T and, specifically to the LFV Z model, low-mass signal production due to PDF suppression. The results are summarized in Table 8. The acceptance times efficiency of the ADD and RS QBH models agree within 1%, and the same prediction is used for the limit extraction.
Results expressed in terms of the coupling limts can be directly compared to those obtained from precision low energy experiments [76]. For the Z model the cross-section times branching ratio is proportional to Q 2 , and the same quark couplings as the SM Z boson are used. The limits on Q are shown in Figure 7 as a function of m Z for the three channels. The most stringent coupling limits from low-energy experiments are from µ-to-e conversion and µ → eee for the eµ channel, from τ → eee and τ → eµµ for the eτ channel, and from τ → µµµ and τ → eµµ for the µτ channel. The current experimental limits on these processes are converted to coupling limits using the formulae of Ref. [77] and are shown in Figure 7. For the eτ and µτ channels, the observed limit is restricted up to the Z mass point of 4 TeV. This because for the higher mass points, the limit on Q becomes sufficiently large that the total width of the Z would be significantly larger than the experimental resolution and violate our assumptions on that. For the eµ   Table 8: Expected and observed 95% credibility-level lower limits on the mass of a Z boson with leptonflavor-violating couplings, a supersymmetric τ-sneutrino (ν τ ) with R-parity-violating couplings, and the threshold mass for quantum black-hole production for the ADD n = 6 and RS n = 1 models.  sneutrino (ν τ ), and (c) QBH ADD and RS production cross-section times branching ratio for decays into an eµ final state with and without the b-veto requirement. The signal theoretical cross-section times branching ratio lines for the Z model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the RPV SUSYν τ includes the NLO K-factor calculated using LoopTools [33]. The acceptance times efficiency of the ADD and RS QBH models agree within 1%, and the same curve is used for limit extraction. The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands on the results with the b-veto requirement. sneutrino (ν τ ), and (c) QBH ADD and RS production cross-section times branching ratio for decays into an eτ final state. The signal theoretical cross-section times branching ratio lines for the Z model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the RPV SUSYν τ includes the NLO K-factor calculated using LoopTools [33]. The acceptance times efficiency of the ADD and RS QBH models agree within 1%, and the same curve is used for limit extraction.

Model
The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands. sneutrino (ν τ ), and (c) QBH ADD and RS production cross-section times branching ratio for decays into an µτ final state. The signal theoretical cross-section times branching ratio lines for the Z model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the RPV SUSYν τ includes the NLO K-factor calculated using LoopTools [33]. The acceptance times efficiency of the ADD and RS QBH models agree within 1%, and the same curve is used for limit extraction.
The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands.
channel, the coupling limits in this paper do not compete with those from low-energy experiments, but for the eτ and µτ channels, the coupling limits in this paper are more stringent, though they require additional assumptions on the quark couplings.
For theν τ model, the dependence on the couplings is more complicated because both the production and the decay violate lepton-flavor conservation. Assuming only the dd and couplings, the cross-section times branching ratios are proportional to the Yukawa couplings |λ 311 λ 3i j | 2 /(3|λ 311 | 2 + 2|λ 3i j | 2 ), where i j = 12, 13, and 23 for the eµ, eτ, and µτ channels, respectively. The factor 3 in the denominator accounts for color and the factor 2 is because both final-state charge combinations are allowed ( ± ∓ ). The limits on |λ 3i j | versus |λ 311 | are shown in Figure 8 forν τ masses of 1 TeV, 2 TeV, and 3 TeV. The most stringent coupling limits set by low-energy experiments derive from µ-to-e conversion for the eµ channel, from τ → eη for the eτ channel, and from τ → µη for the µτ channel. The coupling limits in Ref. [3] are scaled to current experimental limits on these processes [78] and are shown in Figure 8. For the eµ channel, the coupling limits in this paper do not compete with those from low-energy experiments, but for eτ and µτ channels, the coupling limits in this paper are more stringent.

Conclusions
A search for a heavy particle decaying into an eµ, eτ, or µτ final state is conducted using 36.1 fb −1 of proton-proton collision data at √ s = 13 TeV recorded by the ATLAS detector at the Large Hadron Collider. The Standard Model predictions are consistent with the data. From the eµ, eτ, and µτ final states, Bayesian lower limits at 95% credibility level are set on the mass of a Z vector boson with lepton-flavorviolating couplings at 4.5, 3.7, and 3.5 TeV, respectively; on the mass of a supersymmetric τ-sneutrino with R-parity-violating couplings at 3.4, 2.9, and 2.6 TeV; and on the threshold mass for quantum blackhole production in the context of the Arkani-Hamed-Dimopoulos-Dvali (Randall-Sundrum) model at 5.6 (3.4), 4.9 (2.9), and 4.5 (2.6) TeV. The quantum black hole limits extracted are below those extracted in dijet searches, since the branching ratio to dijet is expected to be much larger than to dilepton. Coupling limits for the lepton-flavor-violating Z boson andν models are more stringent than those from low-energy experiments for the eτ and µτ modes.