Search for new phenomena in two-body invariant mass distributions using unsupervised machine learning for anomaly detection at $\sqrt{s} = 13$ TeV with the ATLAS detector

Searches for new resonances are performed using an unsupervised anomaly-detection technique. Events with at least one electron or muon are selected from 140 fb$^{-1}$ of $pp$ collisions at $\sqrt{s} = 13$ TeV recorded by ATLAS at the Large Hadron Collider. The approach involves training an autoencoder on data, and subsequently defining anomalous regions based on the reconstruction loss of the decoder. Studies focus on nine invariant mass spectra that contain pairs of objects consisting of one light jet or $b$-jet and either one lepton ($e$, $\mu$), photon, or second light jet or $b$-jet in the anomalous regions. No significant deviations from the background hypotheses are observed.

Searches for new physics phenomena beyond those described by the standard model (SM) require advanced techniques to devise selections that involve a large number of variables characterizing collision events.Furthermore, limited understanding of how new physics would manifest itself has inspired the design of model-independent searches [1].In traditional methods, event selections are optimized to target specific signatures of signals beyond the SM (BSM signals) and to maximize their separation from SM background processes.Alternatively, event selection criteria can be relaxed to target more general signatures, but this reduces the ability to suppress background.
Machine learning (ML) anomaly-detection methods [2][3][4][5][6][7][8][9][10][11][12] provide a new way to study collision events.One such approach uses an autoencoder (AE) [13][14][15][16], a neural network architecture that is commonly used in unsupervised learning.The AE is trained using mostly SM background events and is applied to identify collision events that display kinematic properties different from those of SM events.ATLAS previously used a weakly supervised learning technique for massive dijet final states [17] and an unsupervised machine-learning method to identify anomalous jets in a search for BSM resonances decaying into a Higgs boson and a generic new boson [18].
This Letter presents a generic search for resonances in various two-body final states that applies an anomaly detection method to the event topology, for the first time in ATLAS.Events are triggered by the presence of an isolated lepton to reduce contamination from QCD multijet events.The two-body final states consist of jet þ Y, where the jet can be a light jet or a b jet (containing a b-hadron decay) while Y can be a lepton (electron or muon), a photon, or another light jet or b jet.The highest-p T candidate for each kind of final-state object is used, except for final states with either two light jets or two b jets, where both the leading and subleading light jets or b jets are considered.Triggering on the lepton allows studies of invariant mass distributions below 1 TeV, which would be difficult to model because of trigger threshold effects if jet triggers were used [19].
The data correspond to 140 fb −1 of pp collisions at ffiffi ffi s p ¼ 13 TeV collected by the ATLAS experiment at the Large Hadron Collider (LHC).The ATLAS detector [20] is a multipurpose particle detector with cylindrical geometry [21].It consists of an inner tracking detector (ID) surrounded by a superconducting solenoid, electromagnetic (EM) and hadronic sampling calorimeters, and a muon spectrometer with three toroidal superconducting magnets, providing a near 4π coverage in the solid angle.A two-level trigger system is used to select events for storage.Events used in this analysis were selected on-line by singleelectron or single-muon triggers [22][23][24].An extensive software suite [25] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.
The interaction vertex with the highest sum of the squared transverse momenta (p 2 T ) of all associated tracks with p T > 500 MeV is selected as the primary vertex [26].
Electrons are reconstructed from energy clusters in the EM calorimeter that match a reconstructed track [27].Muons are reconstructed by combining a track from the ID with one from the muon spectrometer [28].Electrons (muons) must have E T > 20 GeV (p T > 20 GeV) and pseudorapidity jηj < 2.47 (jηj < 2.5), excluding the region 1.37 < jηj < 1.52 for electrons.To ensure that selected leptons originate from the primary vertex, their tracks must have jd 0 =σðd 0 Þj < 5 (3) for electrons (muons) and jz 0 sinðθÞj < 0.5 mm for both lepton flavors.Here d 0 and σðd 0 Þ are the transverse impact parameter and its uncertainty, and z 0 is the longitudinal impact parameter along the beam line.Electrons must satisfy the "Tight" likelihood-based identification criterion defined in Ref. [27], and the "FCTight" isolation requirement defined in Ref. [23].Muons must satisfy the "Medium" cut-based identification criterion and the particle-flow-based "PflowTight" isolation requirement [28].Photons are reconstructed from energy deposits in the central EM calorimeter [27].They must have p T > 20 GeV and pseudorapidity jηj < 2.37, excluding the region 1.37 < jηj < 1.52.The "tight" identification requirement and the "tight" isolation requirement are applied, both defined in Ref. [27].Jets are reconstructed using the anti-k t algorithm [29,30] with a radius parameter of 0.4, applied to tracks in the ID and topological clusters [31] processed using a particle-flow algorithm [32].To reduce the effect of additional collisions in the same and neighboring bunch crossings (pileup), jets with p T < 60 GeV and jηj < 2.4 must pass a jet-vertex-tagger requirement that has a 96% selection efficiency for hard-scattered jets [33].Jets containing a b hadron are identified using the 77% efficiency working point of the DL1r b-tagging algorithm [34].Jets failing to meet the b-tagging criteria are identified as light jets.An overlap-removal procedure detailed in Ref. [35] is applied to the selected objects.
The missing transverse energy, E miss T , is calculated from the vector sum of the transverse momenta of all reconstructed objects associated with the primary vertex.To account for soft hadronic activity, a term including tracks associated with the primary vertex, but not with any of the reconstructed objects, is included in the E miss T calculation [36].
Following the reconstruction of these objects, events are first selected by requiring at least one lepton (e or μ, denoted by l) with p l T > 60 GeV, and at least one jet with p T > 30 GeV.A trigger matching requirement [22] is applied to ensure that the reconstructed lepton lies in the vicinity of the corresponding trigger-level object.Jets and b jets are required to have a pseudorapidity of jηj < 2.4.These requirements constitute the event preselection.Nine invariant mass distributions are studied in this analysis.The invariant mass distributions of m jj and m bb are reconstructed from the leading and subleading (b) jets in each event; m jb is reconstructed from the leading jet and leading b jet; and m je (m be ), m jμ (m bμ ), and m jγ (m bγ ) are reconstructed from the leading (b) jet and leading electron, muon, or photon, respectively.An event that contains multiple types of objects can contribute to multiple mass distributions.
Several Monte Carlo (MC) samples were simulated to validate the analysis procedure, although the anomaly detection relies only on data samples.All simulated SM samples, which are used to model the background, include a detailed ATLAS detector simulation [37] based on GEANT4 [38].Samples for the benchmark BSM signal models instead include a faster, less detailed detector simulation [37].Additional pp collisions generated using PYTHIA [39] with the A3 set of tuned parameters [40] were overlaid to simulate the effects of pileup interactions that match the multiplicity of additional collisions in data.
Kinematic features of the final-state objects in the preselected events are structured in a matrix called the rapidity-mass matrix (RMM) which is proposed as an input for machine learning [41].The RMM was tested for an anomaly detection method using MC event generators and was found to produce more robust AE training than when four-momenta are used as input [42].The RMM is a square matrix employed to represent a comprehensive picture of the event in terms of popular variables used for BSM searches, such as E miss T , transverse energies, transverse masses, Lorentz factors, two-particle invariant masses, and two-particle rapidity differences.In this analysis, the reconstructed final-state objects are light jets, b jets, muons, electrons, or photons.A maximum of ten light jets or b jets are considered, along with up to five electrons, muons, and photons.Together with E miss T , a total number of 36 finalstate objects are used to define the RMM.The objects within each type are ordered in decreasing transverse energy.If the number of available objects for a particular type is less than the maximum allowed, the remaining elements in the corresponding row and column of the RMM are filled with zeros.This ensures that the input size is the same for every event.
By construction, all elements of the RMM are defined to be between 0 and 1, and most variables are Lorentzinvariant under boosts along the longitudinal axis.To reduce biases in the shapes of the jet þ Y invariant mass spectra, the nine invariant mass variables are excluded from the RMM.The resulting input dimension is 36 2 − 9 ¼ 1287.The RMM matrix is then flattened to a one-dimensional input vector before being fed into the AE.
The AE is implemented using TensorFlow [43].It comprises two sections, an encoder and a decoder.The encoder compresses the input to a latent dimensional space, whereas the decoder takes the data in the latent layer and decompresses it back to its original size.The network architecture for the encoder contains two hidden layers, with 800 and 400 neurons, respectively, and a latent layer of 200 neurons.The decoder reverses the structure of the encoder, using 400 and 800 neurons for the two hidden layers, and 1287 PHYSICAL REVIEW LETTERS 132, 081801 (2024) neurons for the output layer.The leaky ReLU [44] activation function is applied to the output in all hidden and output layers.The reconstruction loss is defined as the mean squared error between the input and reconstructed values of the dataset.The logarithm of the reconstruction loss, log(Loss), is defined as the anomaly score for each event.Artificial anomalous events with different characteristics were created and used as input to test the separation power of this architecture.The default AE outperforms (convolutional) variational AEs.Furthermore, the selected architecture has shown better separation between anomalous and nonanomalous events than AE architectures with other numbers of neurons.To form the training and validation datasets, 1% of the collision events are randomly selected after the preselection.They provide both a sufficient sample size and a good representation of typical collision events.The chance of finding BSM signal events in these datasets is considered to be negligible.Even if the training dataset is influenced by BSM physics, such contributions would alter the distribution of the log (Loss) of the AE and thus the shape of the mass spectra of the background but would not produce discernible bumps [42].Therefore, the search for localized enhancements in the invariant mass distributions should not be affected significantly.
The training and validation sets are randomly split using a 7∶3 ratio.The Adam optimizer [45] with a learning rate of 0.001 is used to train the network by minimizing the log (Loss) of the training sets.The batch size is 100 and all events in the training set are reshuffled at the beginning of each epoch.Training of the AE is monitored via the reconstruction loss of the validation set, and terminated if this value does not decrease within 30 epochs.To avoid selecting an overtrained model, the training is repeated 50 times with different network initialization and different random splits of the training and validation sets.The performance is found to be stable and the model that gives the median validation loss among these trials is used.
The distribution of the anomaly scores of the collision data is shown in Fig. 1.The data correspond to 140 fb −1 and include the 1% of data used in training.A peak is seen near logðLossÞ ¼ −10.8 from the total of 166 055 597 events after preselection.For comparison, several benchmark BSM models, studied in Refs.[46,47], are overlaid.These representative models are characterized by the presence of an isolated lepton in the final states.The models and their parameters are (1) charged Higgs boson production in association with a top quark, tbH þ with H þ → t b, with the mass of H þ between 0.4 TeV and 2 TeV with a varying step size [48].All top decays are enabled; (2) a Kaluza-Klein gauge boson, W KK , with the SM W boson and a radion ϕ that decays into two gluons, with the mass of W KK between 0.5 TeV and 6 TeV, and the mass difference of W KK and the radion being 0.25 TeV [49]; (3) a model with a Z 0 boson and composite SUð2Þ L fermion doublets that breaks lepton-flavor universality ("composite lepton"), Z 0 → El, with the Z 0 boson mass of 0.5-4 TeV and various mass hypotheses for the composite lepton E decaying to Zl with Z → q q [50]; (4) the sequential standard model (SSM) W 0 → WZ 0 → lνq q, with the mass of the W 0 boson ranging from 0.7 TeV to 6.2 TeV with a varying step size and the mass difference of the W 0 and Z 0 bosons being 0.25 TeV [51]; (5) a simplified dark-matter (DM) model Z 0 → q q, with an axial-vector mediator Z 0 boson whose mass ranges between 0.5 TeV and 6 TeV with a varying step size, where one of the quarks radiates a W boson which decays into lν [52].As can be seen in Fig. 1, the anomaly scores of the BSM processes tend to be larger than those of the collision events, which are all or mostly produced by SM processes.The SSM and DM model tend to have characteristics similar to those of the SM background from the event selection in this analysis, thus yielding lower anomaly scores.Although only one hypothetical mass is shown for each type of BSM model, it was found that events with larger hypothetical particle masses  4) the SSM W 0 → WZ 0 → lνq q; (5) a simplified dark-matter model with an axial-vector mediator Z 0 → q q, where one of the quarks radiates a W boson decaying to lν.The BSM predictions represent the expected number of events from 140 fb −1 of data for heavy particle (H þ , W KK , Z 0 , W 0 , and Z 0 , respectively) masses around 2 TeV.The distributions for the BSM models are smoothed to remove fluctuations due to low MC event counts.The vertical lines indicate the start of the three anomaly regions (ARs).The labels of the three ARs indicate the visible cross section for hypothetical processes yielding the same number of events as observed in the 140 fb −1 dataset.The AE is applied to preselected events without any requirements on invariant mass distributions.
have larger anomaly scores.Furthermore, three anomaly regions (ARs) are chosen to maintain sensitivity to different BSM models.They are defined by logðLossÞ > −9.1, > −8.0, and > −6.5, respectively, as indicated by the vertical lines in Fig. 1.The labels for the three ARs indicate the visible cross section for hypothetical processes yielding the same number of events as observed in the 140 fb −1 dataset.The anomaly score distributions are consistent across the data-taking years, indicating that the AE training is robust against different pileup conditions and triggering criteria.
The nine invariant mass (m) spectra in each anomaly region are examined to search for any localized excesses above the background hypothesis.The bin widths chosen for the spectra increase from 16 GeV to 150 GeV over the mass range of 0.3-8 TeV to reflect the jet energy resolution of the ATLAS detector.As in the previous ATLAS searches based on the single-lepton requirement [46,47], the following analytic function is used to describe the shape of the smoothly falling background: where x ≡ m= ffiffi ffi s p and p i are free parameters.To avoid any possible bias, the analysis was developed without looking at the signal mass distributions in the data.To verify that Eq. ( 1) can describe the background shape separately for each of the nine invariant mass distributions, fits are performed on the SM background.This background is estimated from the MC simulation samples, which are composed of W þ jets, t t and single-top processes, and FIG. 2. Invariant mass distributions of jet þ Y for m jY > 0.3 TeV in the 10 pb AR along with the fit of Eq. ( 1).The fits are represented by the lines, while the associated statistical uncertainties are indicated by the shaded bands.The lower panels show the bin-by-bin significances of deviations from the fit, calculated as ðd i − f i Þ=δ i , where d i is the data yield, f i is the fit value, and δ i is the data uncertainty in the ith bin.
PHYSICAL REVIEW LETTERS 132, 081801 (2024) 081801-4 a "loose electron" control sample from data, which represents the QCD multijet events.Each event in the control sample must contain a leading electron that satisfies the "loose", but not the "tight", identification criteria [27], while also passing the anomaly region selections defined by the 1% data trained autoencoder.This ensures that the control sample is orthogonal to the signal sample.The fit results show that Eq. ( 1) can describe the shape of the nine invariant mass distributions in the anomaly regions, with the χ 2 =ndf values ranging between 0.7 and 1.9.The distributions of the fit residuals are consistent with the normal distribution.More complex fit functions do not improve this description.These studies do not take into account systematic uncertainties in the SM simulations.The established functional form of the background hypothesis is then applied to data in the anomaly regions.
Likelihood fits of Eq. ( 1) to the data are performed to determine the free parameters and are shown in Fig. 2 for the 10 pb AR.The data are well described by the fit, with the χ 2 =ndf values ranging between 0.64 and 1.19 for all nine mass spectra.The lower panels show the fit significances, calculated as ðd i − f i Þ=δ i , where d i is the data yield, f i is the fit value, and δ i is the data uncertainty in the i th bin.The distributions of the significances are found to be consistent with normal distributions.
Each invariant mass spectrum is analyzed independently using the BumpHunter algorithm [53] to look for localized excesses without assumptions about BSM signal shapes.The uncertainty in the background shape is taken into account by using an alternative function that replaces the term p 5 ln 2 x with p 5 = ffiffi ffi x p in Eq. ( 1).The algorithm is used to search for statistically significant excesses, taking into (5) a simplified dark-matter model with an axial-vector mediator Z 0 → q q, where one of the quarks radiates a W boson decaying to lν.The multiple markers shown for the composite-lepton model at the same invariant mass values correspond to different composite lepton (E) masses between 0.25 and 3.5 TeV.The center positions of the markers are set to the masses of the corresponding heavy particles.
account the look-elsewhere effect [54] of the full mass spectrum under study and the uncertainty in the background shape.Among the nine invariant mass distributions in the 10 pb AR, BumpHunter finds the largest excess to be in the m jμ spectrum near 4.8 TeV, and the second largest to be near 1.2 TeV, as shown in Fig. 2. Their statistical significances when including systematic uncertainties are discussed below.
Searches for localized excesses were also performed in the nine invariant mass distributions for the 1 pb and 0.1 pb ARs.No significant excesses were found.In particular, the 4.8 TeV region of the m jμ spectra does not show any deviation from the background hypothesis in these two AR regions.
The anomaly region selection is expected to improve the discovery potential for BSM models.Figure 3 shows the improvements in discovery sensitivity for the five benchmark BSM models in the 10 pb AR.The improvement is defined as ΔZ ¼ ½ðZ AE =ZÞ − 1 × 100%, where Z AE is the discovery sensitivity in the anomaly region and Z is that before any cut on the anomaly score.The discovery sensitivity is defined as , where s is the number of signal events and b the number of background events calculated from the fit of Eq. ( 1) to the data.Both are counted in a AE1-standard-deviation region around the mean value of the reconstructed signal's mass distribution.The results show that using the AE trained on data improves the discovery sensitivity for most of the benchmark BSM models and mass points.The BSM hypotheses that do not show improvements in sensitivity correspond to models with lowmass hypothetical particles or models where the number of reconstructed objects and their kinematic-variable values in  [pb] [pb] FIG. 4. The 95% C.L. upper limits on the cross section times acceptance (A), efficiency (ϵ), and branching ratio (B) for Gaussianshaped signals with different signal widths.The limits are calculated for events with at least one lepton with p l T > 60 GeV in the 10 pb AR.Two width hypotheses are shown: σ X =m X ¼ 0 and σ X =m X ¼ 0.15.In both cases, the detector resolution for jets is included in the simulation of signal samples.The AE1σ and AE2σ bands around the expected limit are shown for σ X =m X ¼ 0 signals.Mass points are spaced 5% apart, relative to the preceding point, starting at 0.3 TeV.
the RMM are closer to those of SM events.The 1 and 0.1 pb ARs show less improvements in the sensitivity compared to the 10 pb AR due to significant decrease in statistics.
In the absence of any significant resonant signals, upper limits on the cross section times acceptance, efficiency, and branching ratio are set for Gaussian-shaped signals using the CL s method [56].The effects of systematic uncertainties associated with the luminosity, jet energy scale, jet energy resolution, and object identification and reconstruction are included in the signal.These objectrelated uncertainties are derived from the SSM simulated signals.The uncertainty stemming from the choice of background fit function is taken from the difference of results obtained using Eq. ( 1) and the alternative background shape.An uncertainty included to account for a fit bias is computed using the spurious-signal method described in Ref. [57].This is the dominant source of systematic uncertainty and it weakens the upper limits by approximately 10%.The stochastic fluctuations in the AE training also produce an uncertainty if the AE is retrained.It can cause a change of the observed limits by 4% on average [58].
Upper limits at 95% confidence level (C.L.) on the production cross section of resonant signals with an intrinsic width of 0% or 15% of the hypothetical mass in the 10 pb AR are presented in Fig. 4. For a 0% width, the Gaussian signals have widths consistent with the experimental resolutions.The limits are weaker for a 15% resonance width because the signals are spread over more bins.The local significance of a signal with a 0% width for m jμ ¼ 4.8 TeV, where the largest excess was detected by BumpHunter, is found to be 2.9 standard deviations (2.9σ) using the asymptotic formulae [59].A more stringent ("HighPT") identification criterion for muons [28] removes the event near m jμ ¼ 8 TeV, but it does not reduce the significance of the excess at 4.8 TeV.The local significance at around m jμ ¼ 1.2 TeV is 2.8σ.
The presented limits are more stringent than the limits on Gaussian signals presented in Ref. [46], which used the same preselected events.For example, the limits for m jj below 1 TeV reported in this analysis show about a factor of 2-3 improvement compared to the results in Ref. [46].The search sensitivity is comparable at high masses.Many BSM models with complex final states are expected to have high AE acceptances [60].For example, the five benchmark BSM models shown in Fig. 3 have an average AE acceptance above 80% for m jj masses around 0.5 TeV in the 10 pb AR.The trained AE rejects about 90% of the data in the same m jj region.This illustrates the potential to set exclusion limits on heavy BSM particles with high AE acceptance.
In conclusion, model-independent searches for new phenomena in two-body invariant mass distributions are performed by applying an unsupervised anomaly-detection technique to 140 fb −1 of pp collision data at ffiffi ffi s p ¼ 13 TeV recorded by the ATLAS detector at the LHC.Events are preselected to contain at least one lepton with p l T > 60 GeV and at least one jet with p T > 30 GeV.An unsupervised machine-learning algorithm based on the AE architecture is trained with a randomly selected 1% sample of the preselected collision events, without targeting any specific BSM signal.Three anomaly regions are defined using the reconstruction loss of the AE.Nine two-body invariant mass spectra, m jY , are studied, where j stands for a light jet or a b jet, and Y is a second light jet or b jet, a lepton, or a photon.No evidence of significant excesses is observed.The largest excess reported by BumpHunter is near 4.8 TeV in the m jμ distribution for the 10 pb AR.Assuming a resonance with a width of 0%, the local significance of this excess is 2.9σ.
The discovery sensitivity shows a large improvement after the anomaly region selection, which is illustrated using several benchmark BSM models in the 10 pb AR.The analysis sets 95% C.L. upper limits on contributions from generic Gaussian signals to the studied invariant mass distributions in this AR.Compared to previous limits without anomaly region selections, the reported modelindependent limits have a stronger potential to exclude generic heavy states with complex decay modes.

FIG. 1 .
FIG.1.Distributions of the anomaly score from the AE for data and five benchmark BSM models.Their legends, from top to bottom, are: (1) charged Higgs boson production in association with a top quark, tbH þ with H þ → t b; (2) a Kaluza-Klein gauge boson, W KK , with the SM W boson and a radion ϕ; (3) a Z 0 boson decaying to a composite lepton E and l, with E → Zl with a mass of 0.5 TeV; (4) the SSM W 0 → WZ 0 → lνq q; (5) a simplified dark-matter model with an axial-vector mediator Z 0 → q q, where one of the quarks radiates a W boson decaying to lν.The BSM predictions represent the expected number of events from 140 fb −1 of data for heavy particle (H þ , W KK , Z 0 , W 0 , and Z 0 , respectively) masses around 2 TeV.The distributions for the BSM models are smoothed to remove fluctuations due to low MC event counts.The vertical lines indicate the start of the three anomaly regions (ARs).The labels of the three ARs indicate the visible cross section for hypothetical processes yielding the same number of events as observed in the 140 fb −1 dataset.The AE is applied to preselected events without any requirements on invariant mass distributions.

FIG. 3 .
FIG. 3. Values of ΔZ for the discovery sensitivity, as defined in the text, as a function of the invariant mass m.The nine invariant mass distributions are calculated in the 10 pb AR.Positive percentages indicate improvements in sensitivity.Horizontal dashed lines are drawn at 100% and 200% to guide the eye.The five benchmark BSM models are (1) charged Higgs boson production in association with a top quark, tbH þ with H þ → t b; (2) a Kaluza-Klein gauge boson, W KK , with the SM W boson and a radion ϕ;(3) a Z 0 boson decaying to a composite lepton E and l, with E → Zl; (4) the SSM W 0 → WZ 0 → lνq q; (5) a simplified dark-matter model with an axial-vector mediator Z 0 → q q, where one of the quarks radiates a W boson decaying to lν.The multiple markers shown for the composite-lepton model at the same invariant mass values correspond to different composite lepton (E) masses between 0.25 and 3.5 TeV.The center positions of the markers are set to the masses of the corresponding heavy particles.