Combining Resonant and Tail-based Anomaly Detection

In many well-motivated models of the electroweak scale, cascade decays of new particles can result in highly boosted hadronic resonances (e.g. $Z/W/h$). This can make these models rich and promising targets for recently developed resonant anomaly detection methods powered by modern machine learning. We demonstrate this using the state-of-the-art CATHODE method applied to supersymmetry scenarios with gluino pair production. We show that CATHODE, despite being model-agnostic, is nevertheless competitive with dedicated cut-based searches, while simultaneously covering a much wider region of parameter space. The gluino events also populate the tails of the missing energy and $H_T$ distributions, making this a novel combination of resonant and tail-based anomaly detection.


I. INTRODUCTION
The absence of new physics at the LHC is an enduring mystery.Many well-motivated theoretical frameworks such as supersymmetry, extra dimensions, and composite Higgs have predicted signatures of new particles at the weak scale, yet countless searches for these new particles have not found any significant evidence for them to date.
Nearly all of these searches for physics beyond the Standard Model (BSM) are model-specific to some degree, optimized for specific signal scenarios, often using simulations.It is highly likely that these searches have not thoroughly covered the full phase space at the LHC, leaving a real possibility of new physics simply hiding in the data at the LHC, undiscovered because we have not searched for it.
Recently there has been considerable interest in developing more model-agnostic search strategies for the LHC [1][2][3].In particular, a lot of activity has focused on "resonant anomaly detection" methods [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].In these approaches, one singles out a specific kinematic feature (e.g. the invariant mass of something in the event) in which new physics is postulated to be localized (resonant) to a window.This window serves as the signal region (SR) of the anomaly search.Then one uses the sidebands and modern machine learning techniques to learn a multivariate, data-driven background template in additional features x.Finally, one employs further techniques (such as a classifier) to learn the difference between the background template and the data itself in the SR, in the form of an anomaly score If p bg (x) is the true background density and the classifier is optimal, this is the Neyman-Pearson optimal (idealized) anomaly detector in the SR.By cutting on R(x), one can greatly enhance the significance of any resonant new physics in the SR.So far this activity has almost exclusively focused on new physics that is fully localized -both in the SR and in the features x -and using a global resonant feature such as the invariant mass of a dijet system.Here we point out that the resonant anomaly detection technique is more general and both assumptions can be easily relaxed.
First, resonant anomaly detection methods can be applied to any resonant feature in the event, as long as the background satisfies the assumption of smoothness in that feature.One strong motivation for considering this broader perspective is that in many well-motivated models, such as those for the electroweak hierarchy, highly boosted resonances (either Z/W/h from the SM or additional BSM particles such as a heavier Higgs boson) can be quite common in the decays of heavier particles.
Additionally, we seek to broaden the scope of resonant anomaly detection in this work by pointing out that the signal need not be localized in all (or any) of the features x; it can also appear on the tails of the x distribution (although of course the signal needs to be distinguishable in some of the features).This is a feature of resonant anomaly detection that has not been utilized so far.Anomalies on the tails of distributions such as p miss T , H T and M ef f are quite common and plausible in models of TeV-scale new physics.
In this work, we illustrate this broader application of resonant anomaly detection using a supersymmetric (SUSY) scenario as a well-motivated example.This SUSY scenario consists of gluino pair production, with the gluinos decaying to neutralinos plus a pair of (light quark) jets, and the neutralino decaying to another neutralino (LSP) through an on-shell Z boson, as shown in figure 1.The LSP neutralino is much lighter than the second neutralino, meaning the Z's are highly boosted.Therefore every event has two boosted Z's, jets, and missing energy.CMS previously searched for this signal with a cut-based analysis [23].It defined a series of SRs requiring leading and subleading AK8 jets within m ∈ [70, 100] GeV and considering p miss T in different exclusive bins.The background was estimated in two steps.First, the total number of events B norm in the SR was determined using sidebands in the leading AK8 jet mass (the subleading AK8 jet was required to be in the SR).Then the distribution in p miss T bins (shape) is determined using the p miss T distribution in control regions defined by requiring both AK8 jets to be outside the SR, renormalized to B norm .
Here we point out that we can use one of the Z's to define a SR for resonant anomaly detection, and then we can use the rest of the kinematic variables (m jet of the subleading AK8 jet, p miss T , H T , etc) to play the role of x in resonant anomaly detection.We show that this allows for a potentially more expansive and model-agnostic search, while not sacrificing much in sensitivity to the original SUSY signal.We illustrate this with additional SUSYmotivated scenarios (different decay branching ratios to h and Z), as well as hypothetical nonminimal scenarios involving non-SM resonances.
Notably, all of these scenarios have p miss T , and in fact the p miss T is essential to suppress the resonant backgrounds from SM Z/W +jets with hadronically decaying Z/W .This leads to a novel combination of a resonant and nonresonant anomaly detection strategy. 1This is also the first application of model-agnostic strategies to the SUSY domain and opens up the potential for many more new avenues in the search for SUSY and other wellmotivated top-down scenarios.Our method should be contrasted with existing ML-based approaches to SUSY 1 See [24] for a different, fully nonresonant application of weakly-supervised anomaly detection to the jet constituents of the monojet+p miss T final state.Motivated by (nonresonant) dark showers, they did not obtain their background templates from sidebands in the jet mass; instead they considered an idealized (perfect) background template from simulated Z(νν)+jets events.
The outline of our paper is as follows: Section II describes how the signal and background processes are simulated.In section III we summarize the steps involved in CATHODE.We show the results of applying CATHODE to different signal processes in section IV.We conclude in section V. Finally, in two appendices we describe our recasting of the LHC analysis, and the CATHODE receiver operating characteristic (ROC) curves for various signal models.

II. DATA
Since all the methods described here (both the CMS search and CATHODE) fully rely on data for estimating backgrounds (aka are "fully data-driven"), the simulation data we generate here is meant to play the role of real data, and all background estimates and significances etc we derive are meant to illustrate the result one would get applying these methods to collider data.There will be no events generated here that play the role of simulations at the LHC.
For Standard Model (SM) background data, we take into account the three largest contributions of background events to the CMS search, arising from Z + jets, W + jets and t t+ jets.W and Z events were generated with one to four additional final state partons while t t were generated with up to 3 additional partons.
For the benchmark signal (to be used to compare the performance of the CMS search vs. the CATHODE method), we follow the CMS search and generate gluino pair production (with 0 to 2 additional partons), with subsequent cascade decay pp → g g, g → q q χ 0 2 , χ 0 2 → Z χ 0 1 where the neutralino χ 0 2 is the next to lightest supersymmetric particle (NLSP) and χ 0 1 is the lightest supersymmetric particle (LSP).The mass splitting between the gluinos and NLSP is set to 50 GeV while the LSP mass is 1 GeV.This results in soft jets from the first step of the decay and a highly boosted Z boson.The LSP escapes the detector and contributes large amounts of missing energy.
Later we will also consider decays of χ 0 2 to X χ 0 1 where the X is either a Standard Model Higgs boson or a new Higgs boson with mass besides 125 GeV like the new Higgs bosons in supersymmetric extensions of the Standard Model.The Standard Model Higgs boson decays in ∼ 58% of cases to b b while for the latter case we set the branching ratio to 100%.
All events are generated with Mad-Graph5 aMC@NLO 3.2.0 with √ s = 13 TeV.The NNPDF3.1LOPDF set [30] is used throughout.At the generator level a minimum H T cut of 250 GeV is imposed.Gluinos are decayed spin uncorrelated with Madspin [31] to q q χ 0 2 via an off-shell squark and subsequently χ 0 2 → X χ 0 1 .Showering is done using Pythia 8.306 [32] with MLM [33,34]  CP5 was used for background events while CP2 [35] was used for the signal samples.The number of background events in each channel is scaled to match their respective next-to-leading-order cross sections [36].
Detector effects are simulated using Delphes 3.5.0[37] with the delphes card CMS.tcl detector card modified to account for the lepton isolation criterion.Particles are clustered into jets using the anti-k T clustering algorithm with cone-radius parameter R = 0.4 for AK4 jets and R = 0.8 for AK8 jets.To be considered jets have to have p T > 30 and |η| < 2.4.
The following selection criteria are imposed for both the classical CMS-recast and the dataset for CATHODE: (1 − cos(ϕ miss − ϕ track ) < 100 GeV and p T > 5 GeV for tracks identified as an electron/muon or else 10 GeV.

at least 2 AK8 jets with p T > 200 GeV
The number of background events that pass this baseline selection is shown in the first line of table I.In total, the dataset is composed of 107,421 background events corresponding to L int = 300 fb −1 after cuts 1-7.Signal events are injected according to the gluino-pair production cross section.
Figure 2 shows that the feature m J1 is smooth for the background while it is resonant for the signal.(Hadronically decaying W 's and Z ′ s are eliminated by the requirements on p miss T .)This is a necessary feature for the application of the CATHODE method employed in section III. Figure 3 shows that the signal of new physics is found on the tail of the p miss T distribution, while the background peaks at lower p miss T .We will show that the powerful discriminator p miss T can be leveraged by CATH-ODE even though the signal is found on the tail of the distribution.

III. CATHODE
Here we recap the main points of the inner workings of Classifying Anomalies THrough Outer Density Estimation CATHODE (for more detail see [14]).In very broad strokes, CATHODE aims to learn the density of background events in a signal-depleted region and estimates the density inside the signal enriched region by interpolation.Then, artificial samples are generated in that region, which should follow a signal-depleted distribution.Using a classifier, which is trained to distinguish between the artificial and real events, we can approximate the likelihood ratio (1).This would be the ideal (optimal) model-agnostic anomaly detector, as it is monotonic with p signal (x)/p bg (x) for any signal (since p data (x) is an admixture of p signal (x) and p bg (x)) [38].This allows CATHODE to classify data events as background-like or signal-like.The whole method works by learning directly form data.The training and model selection of both the density estimation and classification is completely agnostic of any signal truth label.
In this study, the events are represented as the tuple m J1 and x with where J 1 , J 2 are the leading/subleading AK8 jets and τ 21 = τ 2 /τ 1 is the ratio of n-subjettiness variables [39].
To compare the technique to the classical search more directly we also consider the reduced set of features so that CATHODE only gets to use the same information.We use a slightly modified version of the original repository2 to allow for any dimension for x.

A. Data preparation and density estimation
First, one defines the signal region (SR) as an interval in m J1 where the signal is expected to be concentrated similar to a classical bump hunt.The complement of the SR defines the sideband (SB).As in any bump hunt, the SR window has to account for the position and the width of the signal bump.Because the reconstructed jet mass is not distributed symmetrically around the mass m of the mother particle (which is the Z, the Higgs or a BSM Higgs in this paper), we chose parameterization We estimate the mass resolution to σ m = 15% and round the window to the closest GeV.set (25%)¸used to select the models used in the next steps.To address the finite number of real SB events we use leave-one-out cross validation such that we get four datasets with nonoverlapping validation sets.The data is transformed (preprocessed) for easier learning by shifting and scaling the observables in x to fit the interval (0, 1), then applying a logit tranformation3 , and again shifting and scaling to unit standard deviation and zero mean.
For density estimation, a Masked Autoregressive Flow (MAF) is used with affine transformations [40].The MAF constructs invertible transformations with tractable Jacobians that map a simple multidimensional distribution (e.g.multiple Gaussians as is considered here) to the target density, in this case the conditional probability p data (x|m J1 ∈ SB).The MAF uses 15 blocks of Masked Autoencoder for Distribution Estimation (MADE) [41] to learn the transformations.The number of events it is trained on depends on the signal region but is typically of the order of 10 5 . 4Training is done with the hyperparameters listed in tab.II.
After training the ten epochs with the lowest validation loss are selected for the sampling step.

B. Sampling SR events
The next step aims to sample synthetic events inside the SR using the four density estimates of the last step.Kernel density estimation with Gaussian kernel and bandwidth of 0.01 is used to model the m J1 distribution inside the SR.This is then used to sample N = 1, 000 events from each of the ten DE models which are combined, shuffled and split between the training set (60%) and validation set (40%) for the next step.The training and validation sets of all four density estimators are combined respectively to form the synthetic dataset with a total of 40,000 events.Compared to the roughly 10,000 real events in the SR (see Table I second line) this is intentionally oversampled to improve the classification performance [14].Setting N even higher did not improve results systematically.The synthetic background events and the real SR events are then standardized in the SR without the logit transformation.
The distributions of the synthetic events are shown in orange in figure 3.In all of our models the signal is located in a resonance in m J2 and in the tail of the p miss T distribution.The density estimation has to model the shape reasonably well so this powerful classification feature can be leveraged.This is accomplished successfully as shown in figure 3.

C. Classifier and anomaly detection
Now a classifier is trained on both the synthetic and real SR dataset to distinguish the sampled events, which should follow the background distribution, from the real events, which additionally might contain events following the signal distribution.
The classifier consists of 3 hidden layers with 64 nodes and ReLU activation each and it is optimized using the hyperparameters given in tab.III.Because the datasets are imbalanced, a weight is assigned such that both classes contribute equally to the loss.Since in a realistic example the number of events to train and validate on is limited, we employ an additional step of leave-one-out cross validation.The real SR data is partitioned into four subsets of equal size.In each subset, one quarter of the real events are held back as a test set for the anomaly detection while the remaining 75% are split between the training set (60%) and the validation set (40%).(The synthetic background events are also split into train/val sets with the same proportions.)After training, the ten model states with the lowest validation loss are selected and evaluated on the test set.The predicted labels are then averaged over the models and assigned as anomaly scores to the events.This is repeated for the next quarter of the SR data, and so on, until every event in the SR is assigned an anomaly score.
To reduce the statistical effects of severely overperforming and underperforming models, each dataset is shuffled 5 times to allow different selections.Then the entire process of the preceding paragraph is repeated to produce 5 different anomaly scores.All 5 anomaly score assignments are averaged to produce a final, more robust score.
Finally, to even out the influence of signal-event selection, everything is repeated ten times with differing independent sets of signal events.In all the results we report below, we will report the mean and standard de-viation of these ten different trials.
The signal to background ratio is improved by cutting on the anomaly score above a critical value R c . Figure 4 shows the distributions of the anomaly score R for the signal and background.No additional selections are performed.In a real application one would perform statistical inference by means of a bump hunt on the R distribution which is beyond the scope of this work.Instead the performance is evaluated using the nominal significance Z = S/ √ B with S (B) the number of signal (background) events after imposing this cut.This makes use of the truth labels which an experiment would have to replace by other means of background estimation.One still has to chose a strategy to set R c .In the following we will show the signal significance with R c set to maximize Z with at least 5 background events left to show the best performance one could hope for.Since a real application does not have access to the truth labels this is not immediately applicable.To show a more realistic method we also show the performance where R c is set so that 1% of SR events pass the cut while also containing at least 5 background events.

A. Nominal signal model
We first turn our attention to the nominal signal model where χ 0 2 → Z χ 0 1 .This is the signal model the dedicated CMS search [23] was aimed at.

Three features
We start by using the limited feature set x = m J2 , p miss T , H T so CATHODE does not have access to more information than the classical search.To compare with CMS we calculate the signal significance for events inside the signal region m J1/J2 ∈ [70 GeV, 100 GeV] with the b-veto mentioned in section A 1 applied.Since the search gets most of its sensitivity from the highest p miss T bins, we apply an additional cut p miss T > 800 GeV. 5 This leads to roughly the same number of events as when only the top 1% of events are kept for CATHODE.For a gluino mass with sizable cross section like 1700 GeV the classical search yields on average for ten independent signal injections Z = 20.Using CATHODE with 3 features the significance is on average Z = 34 ± 2.
Evidently, CATHODE outperforms the classical approach, even though CATHODE is more model agnostic.The reason is that the classical approach, being cut based, misses correlations between the features that the multivariate classifier of CATHODE can pick up.
To confirm this, we also investigated the sensitivity of a fully supervised approach, using the same classifier architecture and hyperparameters as that of CATHODE.The training data for the fully supervised classifier consists of an additional 300 fb −1 background events and 10,000 signal events.60% of this dataset is used in training while the remaining 40% is used as a validation set to select the best performing model.Evaluating this classifier again with selecting only the top 1% of anomaly scores results in a significance of on average Z = 33 ± 4. We conclude that CATHODE is saturating the performance of the fully-supervised classifier for this amount of signal (unsurprisingly, since this is a lot of signal), and that the deep neural networks of both CATHODE's classifier and the supervised classifier can leverage correlations to improve the signal significance significantly over the classical approach.

Five features
From now on we will use the five features m J2 , p miss T , H T , τ J1 21 , τ J2 21 because the subjettiness variables τ 21 are useful discriminants.Figure 5 shows CATH-ODE's performance compared to the classical strategy.We see that in the relevant region at high gluino masses the conservative cut on R (allowing only the top 1% to pass) reaches only slightly weaker results.We identify the mass where the signal significance is Z = 1.645 with the expected 95% limit on the mass in a real application [42].The conservative cut on R alone excludes gluino masses up to m g = 2066 GeV.This is only slightly weaker than the expected excluded mass of m g < 2145 GeV for a dedicated search at this integrated luminosity.This is expected because a model specific search will be fine-tuned to the specific process while CATHODE is intentionally 5 Technically, the original CMS search uses p miss T -bins, and most of the sensitivity comes from the three highest bins, 800-1000 GeV, 1000-1200 GeV and larger than 1200 GeV, where the background is comparable or subdominant to the signal hypothesis.To get a fair comparison with CATHODE we replace this with a single cut.kept more general.CATHODE's strength lies in this generalization as it is able to detect different models without the need to tweak the approach as we will show in the following sections.

B. Alternate signal model: decays to SM Higgs
Now we turn our attention to another model, where the neutralinos decay via χ 0 2 → h χ 0 1 where h is the 125 GeV Standard Model Higgs boson.All that has to be done for CATHODE is select a new signal window around 125 GeV.A scan over the gluino mass is shown in figure 6.A b-jet selection criterion would be beneficial in this case, but we omit this to keep CATHODE as general as possible.Even without the b-tag CATHODE still generates a sizable signal significance for gluino masses comparable to the expected excluded value.While the dedicated search is expected to exclude gluino masses below 2355 GeV, CATHODE with the 1% cut reaches Z > 1.645 for all masses up to 2233 GeV.With the best possible cut on R this can be pushed to 2300 GeV.As expected CATHODE results in slightly weaker bounds.The opportunity cost of this is significantly lower than a specialized search.The only change in the approach is the choice of the signal region.The intended use of CATHODE scans the signal region over the entire mass range, such that both the decay to Z and Higgs bosons would be included automatically in this strategy without any extra considerations.
C. Alternate signal model: mixed Z/h decays Setting the branching ratio of the χ 0 2 → h χ 0 1 or χ 0 2 → Z χ 0 1 decays to 100% is a rather unnatural choice.Therefore we also show CATHODE's performance for a model where both branching ratios are 50%.This time the anomaly detection has to find two bumps simultaneously.For this we chose the signal window to contain both resonances: m J1 ∈ [70 GeV, 140 GeV].The results of a scan over the gluino masses is shown in figure 7.This time CATHODE seems to outperform the extrapolated bound from the dedicated search [44].The extrapolation from 35.9/fb to 300/fb integrated luminosity is quite far and should be taken with a grain of salt.The dedicated search classifies events in 0,1 and 2 Higgs categories using b-tags.The signal model populates all categories simultaneously.The approach using CATHODE only uses the single signal region without further thought to generate these results.
In figure 8 we show that CATHODE is indeed capable of recovering both bumps corresponding to the decay into Z and Higgs bosons respectively.
Figure 9 shows that CATHODE is very robust against changed in branching ratios.We vary the branching ratio Br( ) and calculate the significance.Regardless of branching ratio, the multiplicative gain of significance by applying the technique is always between 5 and 6.This shows the real strength of the CATHODE approach over the The signal window is set as mJ 1 ∈ [100 GeV, 140 GeV].For the blue line Rc is set to allow 1% of events to pass this cut while the orange line omits the cut completely.The dot-dashed part of the blue line represents parameter points where Rc has to be lowered to allow 5 background events.The shaded region shows one standard deviation around the mean S/ √ B obtained from ten different signal injections.The vertical black line at 2355 GeV indicates gluino mass that is expected to be excluded by rescaling the (expected) limit from a dedicated CMS search for this decay [43] from 137/fb to 300/fb integrated luminosity.There is no red line corresponding to the classical search (as in Fig. 5) because we did not perform a detailed recast of [43].
dedicated searches [23,43,44].With the enlarged SR that covers both decay modes, CATHODE only needs to be trained once, independent of the assumption on the BRs, compared to performing a dedicated analysis for each BR assumption.

D. Alternate signal model: decays to BSM Higgs
Until now we applied CATHODE only to models where the position of the bump is known beforehand.But one strength of the technique is that we do not even need to know that.To discuss this further we now focus on another model that induces the neutralino decay χ Sensitivity of CATHODE and the classical strategy.The signal window is set as mJ 1 ∈ [70 GeV, 140 GeV].For the blue line Rc is set to allow 1% of events to pass this cut while the orange line omits the cut completely.The dot-dashed part of the blue line represents parameter points where Rc has to be lowered to allow 5 background events.The shaded region shows one standard deviation around the mean S/ √ B obtained from ten different signal injections.The vertical black line at 2060 GeV indicates gluino mass that is expected to be excluded by rescaling the expected excluded cross section obtained by the dedicated CMS search for this decay [44] from 35.9/fb to 300/fb integrated luminosity.the given choice of features.For this we perform a parameter scan over m H from 35 GeV to 515 GeV in 10 GeV steps shown in figure 11.The method reaches reliably signal significances of order ten up to m H ∼ 350 GeV without using b-tags as otherwise powerful discriminators.

V. CONCLUSIONS
In this paper, we have shown how recently developed techniques for weakly supervised resonant anomaly detection can be easily extended to cover anomalies that also live on tails of distributions.This situation commonly arises in well-motivated weak-scale scenarios such as SUSY, where the cascade decays of heavier BSM particles can produce resonances such as Z's and Higgs bosons, while simultaneously populating the tails of fea-  tures such as p miss T and H T .As long as the signal is localized in one feature where the background is smooth, resonant anomaly detection can be brought to bear on these additional features in order to enhance the sensitivity to signal.
As a proof-of-concept demonstration, we applied the state-of-the-art anomaly detection method CATH-ODE [14] to the SUSY scenario pp → g g, g → q q χ 0 2 , χ 0 2 → X χ 0 1 where X is either a Z boson, Standard Model Higgs, or an additional (N)MSSM Higgs boson.Despite being model agnostic, we showed that the CATH-ODE method is competitive with existing, dedicated, cut-based searches [23,43,44], because -being inherently multivariate -it takes advantage of correlations between features.Moreover, whereas each decay scenario required a separate, optimized analysis, CATHODEbeing model agnostic -is able to simultaneously target them all.
In this work we considered two different feature sets for the CATHODE algorithm, as shown in eqs.( 2) and (3).These were motivated by the SUSY scenarios we considered, and it would be interesting to generalize our study beyond these feature sets, both to increase the degree of model agnosticness of the method, and possibly to enhance the sensitivity to the SUSY signals considered here.For example, our benchmark signals all come with ∼ 4 additional jets from the gluino decay, and their detailed kinematic distributions (instead of just the aggregate feature H T ) may offer additional discriminating power versus the QCD background.Adding features related to additional jets in the event may also give us more sensitivity to spectra not explicitly considered here, for example where the NLSP mass is not so close to the gluino mass.As long as m LSP + m Z ≪ m g , the Z will still be boosted, but the extra jets will get harder as m LSP moves away from m g .
All in all, using modern methods for resonant anomaly detection such as CATHODE allows for a broader and more efficient coverage of the parameter space of physics beyond the Standard Model.With much more data on the way, methods like these should prove indispensable for maximizing the discovery potential of the LHC.

VI. ACKNOWLEDGMENTS
DS is supported by DOE grant DOE-SC0010008.CK would like to thank the Baden-Württemberg-Stiftung for financing through the program Internationale Spitzenforschung, project Uncertainties -Teaching AI its Limits (BWST IF2020-010).GK acknowledges support by the Deutsche Forschungsgemeinschaft under Germany's Excellence Strategy -EXC 2121 Quantum Universe -390833306.

Appendix A: Recasting CMS
We describe how the background samples were simulated as closely as possible to an existing search and verified.We recreate the CMS SUSY search CMS-SUS-19-013 [23].

Recreating CMS-SUS-19-013
The recreation of CMS-SUS-19-013 [23] follows the most important analysis steps of the original publication.The number of events is set to the integrated luminosity of L int = 137 fb −1 .First, a set of remaining cuts are applied to select Z candidates, then the background estimation is recreated before the statistical analysis is performed.The following cuts are applied to select hadronically decaying Z bosons: The resulting p miss T spectrum is shown in figure 12 which agrees with the spectrum shown in the original publication within uncertainties.The background estimation consists of the normalization and the shape estimation.The signal region (SR) is defined as m jet ∈ [70 GeV, 100 GeV].First one demands the subleading AK8-jet to be in the SR.Then a linear function is fitted to the m jet spectrum of the leading AK8 jet outside its SR.The nominal yield B norm is obtained by integrating the linear function in the SR.The statistical error of the yield is obtained from the spread of pseudoexperiments sampled from the fit.Additionally to the linear function Chebychev functions up to the fourth order are fitted.The largest deviation of the nominal yield is then assigned as an additional uncertainty.
The background p miss which agrees with the original publication within uncertainties.The expected background in bin i is then RooStats [45] is used for statistical modeling.It takes N SB i with statistical errors, T and ∆T to model the background in the SR with uncertainties.The signal model contains signal events that pass all cuts and is rescaled to the approximate NNLO+NNLL cros section [46].The overall uncertainty of the cros section is applied to all signal bins.The resulting statistical model is then evaluated with the CL s approach and the asymptotic form of the onesided profile likelihood test statistic.This is used to obtain the 95% C.L. cros sections.The limits are shown in figure 13 for the integrated luminosity L int = 137 fb −1 and in figure 14 for L int = 300 fb −1 .We use the latter dataset for the application of the ML technique since the accuracy is greatly improved with more data points to learn on while in reach for the collider in the near future.In figure 15 we show ROC curves, i.e. background suppression as a function of signal efficiency of our bench-mark models.We also observe a common feature of anomaly detection techniques.With rising signal cross section the classifier learns to better separate background from signal-like events.At the same time, larger signal cross sections correspond to smaller gluino masses, which in turn leads to less expressive features.Both effects combined lead to intermediate gluino masses having the largest background suppression at the same signal efficiency compared to small masses with large cross sections or large masses with very obvious signatures, especially in the decay to Z and Standard Model Higgs bosons.We also observe in the bottom right figure that for low and high Higgs masses the background rejection is noticeably weaker than for intermediate masses.For light Higgs masses, the jets are too similar to background jets while high Higgs masses lead to wide jets that get reconstructed incorrectly.

FIG. 2 .
FIG.2.Distribution of the resonant feature mJ 1 for background and signal events in the sideband (SB) and the signal region (SR).The signal corresponds to mg = 1700 GeV.The distributions are scaled to Lint = 300 fb −1 .

FIG. 3 .
FIG. 3. Comparison of the signal and background distribution inside the signal region and the artificial samples.The artificial samples will be discussed in the next section.The signal corresponds to mg = 1700 GeV.The distributions are scaled to Lint = 300 fb −1 .

FIG. 4 .
FIG.4.Normalized distributions of the anomaly score R of the signal and background processes.The signal corresponds the average distribution of ten independent injections with mg = 1900 GeV.

FIG. 5 .
FIG. 5. Sensitivity of CATHODE and the classical strategy.The signal window is set as mJ 1 ∈ [70 GeV, 100 GeV].For the blue line Rc is set to allow 1% of events to pass this cut while the orange line omits the cut completely.The shaded region shows one standard deviation around the mean S/ √ B obtained from ten different signal injections.The dot-dashed part of the blue line represents parameter points where Rc has to be lowered to allow 5 background events.The vertical black line at 2145 GeV indicates gluino mass that is excluded at 95% confidence level by our 300/fb recreation of the dedicated search[23] .The red dot-dashed line is calculated using the classical strategy with m J 1 /J 2 ∈ [70 GeV, 100 GeV], p miss T > 800 GeV and the b-veto.

0 2 → H χ 0 1 where
H is one of the additional Higgs bosons introduces by the (N)MSSM that has a mass different from 125 GeV.Because the decay of H depends on the specific implementation of SUSY-breaking parameters we set the branching ratio BR(H → b b) = 100%.To find the signal, CATHODE is applied to different signal regions given by varying mass hypotheses m in equation 4, scanning the entire mass range in discrete steps and the signal significance is determined.To demonstrate this we chose m H = 100 GeV and m g = 2000 GeV and show the result in figure10.Once the signal window has significant overlap with the signal bump the signal significance gets sufficiently improved to show the presence of anomalous events.In a real application this would then warrant further investigation with a dedicated search.Finally we show how wide the possible choice of m H is that CATHODE can still help to find in our dataset with

FIG. 8 .
FIG.8.The distribution of the data inside the signal region before the anomaly score cut is shown in gray.After selecting the top 1% of events in the SR the remaining signal events are shown in orange while the remaining background events are shown in blue.The signal corresponds to mg = 1700 GeV.

FIG. 9 .
FIG. 9. Sensitivity of CATHODE for varying branching ratios to Z bosons for mg = 2000 GeV.The shaded region shows one standard around the mean S/ √ B obtained from ten different signal injections.

FIG. 10 .
FIG.10.Significance for a parameter scan over the mass hypothesis in 5 GeV steps, when the mass is not known a priori.The shaded region shows one standard deviation around the mean S/ √ B obtained from ten different signal injections.Masses are chosen as m g = 2000 GeV and mH = 100 GeV.

FIG. 11 .
FIG. 11.Parameter scan of mH with m g = 2000 GeV to show which signals CATHODE can help find in the dataset.The shaded region shows one standard deviation around the mean S/ √ B obtained from ten different signal injections.

8 .
Softdropped m jet ∈ [40 GeV, 140 GeV] of the two highest p T AK8 jets 9. ∆R Z,b > 0.8 for the second highest p T AK8 jet Z and any b-tagged jet where the angular separation is defined as ∆R = ∆ϕ 2 + ∆η 2

FIG. 12
FIG.12.p miss T spectrum of the three leading background processes.The background of the same three processes from the CMS publication is shown in red.The variation of cross section due to changing the energy scale by a factor of 1/2 and 2 as computed by MadGraph is assigned as a systematic uncertainty and added to the statistic errors in quadrature and shown as the error bars.

T
shape is obtained by the sideband (SB) with both AK8 jets outside the SR.The content of the ith p miss T bin is denoted as N SB i .The transfer factor from the SB to the SR is then calculated as FIG.13.Recreation of CMS-SUS-19-013[23].The red dashed line denotes the expected limits of original CMS search.The black dashed line shows the expected limits of the recreation.

TABLE I .
merging.Pythia-Tune Number of events passing each selection requirement for Lint = 300 fb−1

TABLE III .
Parameters of the classifier