Anomaly detection search for new resonances decaying into a Higgs boson and a generic new particle $X$ in hadronic final states using $\sqrt{s} = 13$ TeV $pp$ collisions with the ATLAS detector

A search is presented for a heavy resonance $Y$ decaying into a Standard Model Higgs boson $H$ and a new particle $X$ in a fully hadronic final state. The full Large Hadron Collider Run 2 dataset of proton-proton collisions at $\sqrt{s}= 13$ TeV collected by the ATLAS detector from 2015 to 2018 is used, and corresponds to an integrated luminosity of 139 fb$^{-1}$. The search targets the high $Y$-mass region, where the $H$ and $X$ have a significant Lorentz boost in the laboratory frame. A novel signal region is implemented using anomaly detection, where events are selected solely because of their incompatibility with a learned background-only model. It is defined using a jet-level tagger for signal-model-independent selection of the boosted $X$ particle, representing the first application of fully unsupervised machine learning to an ATLAS analysis. Two additional signal regions are implemented to target a benchmark $X$ decay into two quarks, covering topologies where the $X$ is reconstructed as either a single large-radius jet or two small-radius jets. The analysis selects Higgs boson decays into $b\bar{b}$, and a dedicated neural-network-based tagger provides sensitivity to the boosted heavy-flavor topology. No significant excess of data over the expected background is observed, and the results are presented as upper limits on the production cross section $\sigma(pp \rightarrow Y \rightarrow XH \rightarrow q\bar{q}b\bar{b}$) for signals with $m_Y$ between 1.5 and 6 TeV and $m_X$ between 65 and 3000 GeV.


Introduction
The Standard Model (SM) provides a framework for understanding fundamental particles and interactions that has been remarkably predictive of experimental results over several decades.The discovery of the Higgs boson in 2012 [1,2] completed the set of particles predicted by the SM.
The sensitivity of the Higgs boson mass to radiative corrections implies either extreme fine-tuning in the SM or the existence of new particles at an energy scale not far above the Higgs boson mass.This theoretical motivation, coupled with the existing experimental mass reach of the Large Hadron Collider (LHC) at CERN, motivates searches for new particles with O (TeV) masses.Because the Higgs boson couples more strongly to heavier particles, it is natural to expect that these new heavy particles may have decays to a Higgs boson.
A search is presented here for a new TeV-scale narrow-width particle Y, which decays into a Standard Model Higgs boson  and a new particle X with a mass near the weak scale.A fully hadronic final state is targeted for both particles.Tagging of the boosted Higgs decay into two -quarks ( →  b tagging) enhances a signal by using the highest branching fraction decay of the Higgs boson.A novel jet-level implementation of anomaly detection implemented via an unsupervised machine-learning architecture is used to select X particles solely because of their incompatibility with the expected SM background.
The application of anomaly detection to collider searches is a rapidly growing effort in the high-energy physics community (see Refs. [3][4][5] for overviews).The lack of evidence for new interactions and particles since the Higgs boson's discovery has motivated the execution of generic searches to complement the existing rigorous, model-dependent analysis program.Machine learning provides an excellent framework for the construction of tools that can isolate events in data solely because of their incompatibility with a background-only hypothesis.Building a tool to perform model-independent classification of collision events involves training on data events, and therefore requires the ability to cope with a lack of labels indicating whether inputs are signal or background.This distinguishes the typical supervised classification problem, where all inputs are labeled with a known origin, from the anomaly detection approach, which makes use of unsupervised (no input labels) or weakly supervised (noisy labels) training.A selection tool that is constructed in this way has trade-offs: while it broadens sensitivity to many classes of new physics with a single classifier, it is expected that sensitivity to any specific model will be better with a dedicated supervised approach.
A Feynman diagram for the  →   signal process is shown in Figure 1, where the X can have a variety of hadronic decays.A nominal benchmark decay into two light quarks,  →  q, is generated to provide an interpretation framework for the results.The generic selection of the X is not strongly dependent on its mass, so the analysis is sensitive to X masses spanning several orders of magnitude, from O (10) GeV to O (1) TeV.The masses of the parent and daughter particles yield a kinematic scenario where the final-state particles are highly Lorentz-boosted in the laboratory frame, motivating a "boosted" reconstruction method using large-radius (large-) jets and jet substructure observables to distinguish the boson decay products.An orthogonal "resolved" reconstruction method is used to recover sensitivity to topologies where the X is less boosted and reconstructed as separate small-radius (small-) jets, significantly extending the sensitive phase-space region.Since the signal is resonant, it can be detected with a "bump hunt" on the invariant mass distribution of the reconstructed  and  candidates.
Although model-independent in nature, this search is motivated by several key extensions to the Standard Model which predict heavy diboson resonances.Examples of such theories include extended gauge symmetry models [6], warped extra dimensions [7][8][9], or two-Higgs-doublet models [10].Signals are generated with a simplified model based on spin-1 Heavy Vector Triplets (HVT) [11] which reproduces a large class of explicit beyond-the-SM (BSM) models.The HVT phenomenology provides a Lagrangian that fulfills SM symmetry constraints with an isospin SU(2) triplet formed of a neutral  ′ and two charged  ′± bosons.This search uses the full LHC Run 2 √  = 13 TeV   dataset collected by the ATLAS detector from 2015 to 2018 and corresponding to an integrated luminosity of 139 fb −1 .A search for the  →   process was previously performed by ATLAS using 36.1 fb −1 of data, assuming  →  q, and found no significant excess for Y masses from 1 to 4 TeV and X masses from 50 to 1000 GeV [12].In addition to the much larger dataset, the present analysis includes several key improvements relative to the previous search, such as a neural-network-based tagger optimized for the boosted  →  b topology, anomaly detection for enhanced signal model independence, and the use of two orthogonal regions to capture both boosted and resolved reconstruction of the nominal X decay into two quarks.The ATLAS Collaboration previously leveraged anomaly detection implemented with weakly supervised machine learning in a search for narrow new resonances in dĳet events using the full Run 2 dataset [13].The CMS experiment performed a multidimensional search for diboson resonances with their full Run 2 dataset of 138 fb −1 which includes sensitivity to the signature discussed here [14].A maximum local (global) significance of 3.6 (2.3) standard deviations was observed for two mild excesses of events, at resonance masses of 2.1 and 2.9 TeV with  masses consistent with , , or Higgs bosons.
The analysis regions in the present search are built by selecting two large- jets, with additional criteria to reduce the background to events with Higgs and X particles in the search regions.Three overlapping analysis categories are defined with different  selections, namely the anomaly region and two orthogonal regions optimized for the benchmark  →  q decay.The degree of model-independence of the X tagging is assessed using three simulated signals with different decays of the X particle that lead to differing large--jet topologies, in addition to the two-light-quark final state used to generate the (  ,   ) signal grid.The background to the signal process consists primarily of multĳet events from quantum chromodynamics (QCD) processes.It is estimated with a fully data-driven method that incorporates reweighting based on a deep neural network (DNN) to ensure good modeling.The final fits are made to the reconstructed invariant mass distribution of the Y candidates in overlapping ranges of X candidate mass to increase the sensitivity to a signal.Since no significant deviation from the estimated background is observed, results are presented as upper limits on the cross section times branching fraction for the generic HVT process.

ATLAS detector
The ATLAS detector [15] is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4 coverage in solid angle. 1 The inner-detector (ID) system is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range || < 2.5.The high-granularity silicon pixel detector covers the vertex region and typically provides four space-point measurements per track, the first hit normally being in the insertable B-layer installed before Run 2 [16,17].The next detector outward is the silicon microstrip tracker, which usually provides eight measurements per track.These silicon detectors are complemented by the transition radiation tracker, which enables radially extended track reconstruction up to || = 2.0.Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity.A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (|| < 1.7).The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to || < 4.9.The muon spectrometer surrounds the calorimeters and is based on three large superconducting air-core toroidal magnets with eight coils each.The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector.The muon spectrometer includes a system of precision chambers for tracking and fast detectors for triggering.A two-level trigger system is used to select events.The first-level trigger is implemented in hardware and uses a subset of the detector information to accept events at a rate below 100 kHz on average depending on the data-taking conditions.This is followed by a software-based high-level trigger (HLT) that reduces the rate of selected events to 1 kHz for offline storage.An extensive software suite [18] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and Monte Carlo simulation
This search is performed with √  = 13 TeV   collision data collected by the ATLAS detector during Run 2 of the LHC (2015-2018).After the application of data quality requirements [19] to ensure that all detector components are operating normally, the dataset corresponds to an integrated luminosity of 139.0 ± 2.4 fb −1 [20,21].Data events used in this analysis were triggered by the presence of a high- T large- jet, built at trigger level using calorimeter-cell energy clusters calibrated to the hadronic scale utilizing the local cell-signal weighting method [22].The unprescaled single-large--jet trigger with the lowest jet- T threshold was used in each year, corresponding to thresholds of 360 GeV in 2015, 420 GeV in 2016, and 440 GeV in 2017 and 2018.Further offline selection criteria are imposed (see Section 6) to ensure that selected events are all in the kinematic region where the trigger is fully efficient.
Monte Carlo (MC) event generators were used to simulate the signal targeted by this search.The simulated event samples include the effect of additional   interactions (pileup) in the same or neighboring bunch crossings.The effect is assessed by overlaying simulated minimum-bias events on each hard-scatter event and accounting for how interactions in the previous or following bunch crossing affect the detector response.The simulated events are weighted to reproduce the pile-up distribution observed in each data-taking period.
The detector response was simulated [23] using Geant4 [24] and the events were reconstructed with the same software as used for data.
A generic HVT [11,25] model with simulated  scattering is used as a baseline for the  →   signal samples.The  resonance is assumed to have a width that is narrow compared to the detector resolution.The generation of the  ′ →   process was modified to replace the  boson with a new spin-1 boson X with a width of 2 GeV and a 100% branching fraction to  d.The Standard Model Higgs boson was generated with decays into  b only.Samples with Y resonance masses of 1-6 TeV were generated using MadGraph5_aMC@NLO 2.7.2 and 2.7.3 [26] interfaced to Pythia 8.2 [27] for parton showering and hadronization with the NNPDF2.3loparton distribution function (PDF) set [28], at leading order in QCD, and parameter values set according to the A14 tune [29].A two-dimensional grid was generated for 195 signal points defined by Y and X mass values, with Y masses in the range 1.5-6 TeV and X masses in the range 65-3000 GeV.The  and  mass values account for the kinematic constraint   >   , and the  mass boundaries are dictated by achievable sensitivity.The signal acceptance times the efficiency has values ranging from a few percent to 40% for the various signal region selections.
In addition to the  →   signal grid, three signals with different jet topologies were used to craft and assess the degree of model independence in the  anomaly-tagging procedure.They were all generated with Pythia 8 using the same tune (A14) and PDFs (NNPDF2.3lo) as the  →   signal grid.The HVT  ′ →   configuration from Ref. [30] was used to create generic resonance signatures of the form  → , where the daughter particles decay into either three light quarks or two heavy-flavor quarks.Choosing as benchmarks a parent  mass of 3 TeV and daughter masses of 200 and 400 GeV creates approximately the same kinematic scenario as the  →   phase space of interest, leading to boosted decay products that are captured as large- jets.Depending on the daughter decay into either  or  b, the large- jet is produced with three-prong substructure or displaced vertices, respectively.Lastly, the Pythia 8 Hidden Valley Model A configuration was used to generate a "dark jet" signal that arises from the decay of a  ′ dark sector mediator into two dark quarks, where the  ′ has the same benchmark mass of 3 TeV [31].These jets contain dark matter particles that do not interact with the detector, creating a hadronization pattern that contains both visible and invisible energy.Such jets are difficult to recognize using traditional substructure variables that focus on specific signal topologies, making them an ideal target for anomaly detection based on background-only characterization.

Object selection
Charged-particle tracks measured to have  T > 500 MeV in the inner detector are used to reconstruct   collision vertices, and the one with the largest Σ 2 T of such tracks is chosen as the event's primary vertex.Jets are built with the anti-  algorithm [32] in FastJet [33], using two radius parameter values to cluster the constituent detector signals:  = 1.0 for large- jets , and  = 0.4 for small- jets .Large- jets must satisfy  T > 200 GeV and || < 2.0.They are constructed from combinations of tracks and calibrated calorimeter energy clusters known as Track-CaloClusters (TCCs) [34].The TCCs take advantage of both the excellent energy resolution of the ATLAS calorimeter and the better angular resolution of the ID tracking system at very high energy, where the calorimeter is less able to discern jet substructure.This is particularly advantageous for measuring substructure in highly boosted jets, as is needed in this search to distinguish  →   signal from multĳet background.To minimize the impact of pileup, large- jets are trimmed [35] by removing any  = 0.2 subjets carrying less than 5% of the large- jet's  T .The jet energy and mass is calibrated with a MC-based method [36].
Small- jets are built from particle-flow objects [37], with improved accuracy of the jet's charged-hadron energy component through measurements of the associated charged-particle momenta in the ID.Small- jets must have  T > 20 GeV and || < 4.5.An additional event-level veto is applied to reject events with a jet that is likely to have come from calorimeter noise, beam-induced background, or a cosmic ray [38].
Leptons, while excluded from the analysis region selections, participate in an overlap removal procedure that prevents double-counting of overlapping objects.Leptons must have  T > 7 GeV, with || < 2.47 for electrons and || < 2.7 for muons.To ensure that each lepton's track originates from the primary vertex, the transverse impact parameter  0 , measured relative to the beam line, must have a significance | 0 /( 0 )| < 3 (5) for muons (electrons), and the longitudinal impact parameter  0 , from the primary vertex to the point where  0 is measured, must satisfy | 0 sin | < 0.5 mm.No lepton isolation criteria are applied.
The overlap removal procedure is applied to all selected leptons and jets.If two electrons share the same track, or the separation between their two energy clusters satisfies |Δ| < 0.075 and |Δ| < 0.125, then the lower- T electron is discarded.Electrons that fall within Δ = 0.02 of a selected muon are also discarded.For electrons and nearby small- jets, the jet is removed if the separation between the electron and jet satisfies Δ < 0.2; the electron is removed if the separation satisfies 0.2 < Δ < 0.4.For muons and nearby small- jets, the jet is removed if the separation between the muon and jet satisfies Δ < 0.2 and if the jet has less than three tracks or the energy and momentum differences between the muon and the jet are small; otherwise the muon is removed if the separation satisfies Δ < 0.4.To prevent double-counting of energy from an electron inside a large- jet, the large- jet is removed if its separation from the electron satisfies Δ < 1.0.

Variational recurrent neural network for anomalous-jet tagging
The anomaly detection in this search is performed with a jet-level Anomaly Score ( A ) [39].The  A value is given by a variational recurrent neural network (VRNN) [40], which consists of a variational autoencoder (VAE) whose latent space is updated at each time step of a recurrent neural network (RNN).The key features of this tool as it pertains to this analysis application are provided below, with a comprehensive description provided in Ref. [39].
The class of models that are based on autoencoders are popular for the selection of rare or outlier events in high-energy physics [41][42][43][44][45][46].Such an architecture is composed of two separate stages: an encoder that compresses the input into a lower-dimensional latent space through identification of its salient features, and a decoder that samples from the latent space and attempts to reproduce the input in its original dimensionality.Minimizing the reconstruction error between input and output is a key element of the training, and allows the autoencoder to implicitly learn the underlying distribution of features in its input dataset, and thus recognize anomalous data through its inability to reconstruct such events with low error.
VAEs extend this concept to perform Bayesian inference via a latent space that can accommodate a distribution of encoded information.The VRNN architecture combines the variational inference capabilities of a VAE with the sequence modeling provided by an RNN.The inclusion of a learned, time-dependent prior distribution is an essential feature of the VRNN, and allows the modeling of complex structured sequences with high variability.
The VRNN is trained on large- TCC jets in the ATLAS Run 2 dataset satisfying the criteria in Section 6 for full trigger efficiency, the jet requirements described in Section 4, and  T () > 1.2 TeV.Its architecture is the same as that described in Ref. [39].Training is performed using the PyTorch deep-learning library [47], and the network is updated using the Adam optimizer [48] with a learning rate parameter of 10 −5 .No regularization via weight decay is applied, but gradient clipping is used with a clip value of 10.
The input jets are modeled as a sequence of up to 20 constituent four-vectors per jet, ordered in   -splitting starting from the highest- T constituent.The  T selection is designed to chose input jets with highly boosted topologies, for which the VRNN is expected to most improve the sensitivity over the substructure-based approach, as these topologies are well-described by   -sorted sequence modeling and are also challenging to distinguish with substructure methods.The training is conditioned on four high-level variables, namely the energy-correlator substructure variable  2 [49,50] for two-prong sensitivity, the -subjettiness ratio  32 [51] for three-prong sensitivity, and the two   -splitting-scale ratios  12 and  23 [52].An alignment procedure applied to each jet rescales to the same  T , boosts to the same energy, and rotates to the same orientation in  and , inhibiting the VRNN from selecting jets with rare kinematic properties without considering internal constituent structure.This input modeling is designed to reveal correlations between constituents and substructure, allowing the VRNN to distinguish jets with anomalous energy-deposition patterns from the background of homogenous jets originating from QCD processes.
Since the input consists solely of jets from data, no labeling scheme is used in training, distinguishing this method of unsupervised learning from traditional supervised machine learning, where the input is labeled in signal or background categories.Of the data meeting the above criteria, 90% is used for training, and 10% for validation.No blinding criterion is applied to the training data, so events that are used in training may also enter the final analysis regions.Since the VRNN performance is consistent across a variety of signal contamination fractions in training [39], and the presence of signal in the training dataset would only lower the sensitivity and cannot produce a false excess, biasing the final result is not a concern.
The loss function of the VRNN is composed of two terms: a reconstruction error term, to minimize differences between the decoded result and the original input, and the Kullback-Leibler (KL) divergence  KL [53] of the encoded approximate posterior distribution from the Gaussian prior distribution.The loss L () for each time step  is given by Eq. ( 1), where x() is the input constituent four-vector, y() is the output constituent four-vector,  is the approximate posterior,   is the learned prior, and  is a constant parameter to weight the  KL contribution: Here, the time step of the RNN refers to the location in the input sequence.The learned prior   is a function of the current time step's hidden state, and the hidden state is updated via a recurrence relation after each time step.An overall loss L over the sequence is then computed by averaging the individual time-step losses over the length of the sequence.
The KL divergence of the encoded posterior distribution from the true posterior distribution is jointly minimized along with the loss L (), ensuring that the architecture accurately describes the underlying distribution of data [54].The  KL term has been shown to provide better discrimination between anomalous and standard jets than either the reconstruction error or the loss term as a whole.The Anomaly Score is therefore defined in terms of the  KL of each constituent, averaged over the whole jet, and restricted to the range of (0, 1) via exponentiation, as shown in Eq. ( 2).The resulting variable  A ensures that more-anomalous jets populate higher values and background populates lower values, and is defined as The performance of the VRNN can be assessed by applying Eq. ( 2) to both background and a variety of signal models, and studying the discrimination power of  A .Its performance in this analysis, and the use of the Anomaly Score to select the  jet, are described in the following section.

Event selection
The experimental signature of the  →   signal contains at least two jets with high transverse momentum.Signal-like events are selected through event-level criteria, along with jet-tagging criteria for both the  and  boson.In total, the analysis is performed three times, once for each of the three signal regions (SRs).
The three analysis categories are defined using the selection criteria for the  particle.Each signal region utilizes five background estimation regions, composed of three control regions (CRs) and two validation regions (VRs), which are defined using common selections on the Higgs boson.The definitions of these regions and motivation for the selection criteria are provided in this section.
The analysis uses data events passing a high- T -threshold single large--jet trigger.Selected events must have a primary vertex and at least two large- jets.Only the two highest- T large- jets in an event are considered further, and at least one of their masses must exceed 50 GeV to remove poorly reconstructed events.To ensure that the trigger is fully efficient, the invariant mass of the two large- jets must satisfy    > 1.3 TeV and the leading large- jet,  1 , must have  T ( 1 ) > 500 GeV.These requirements, along with a selection on the    score defined in the next subsection, compose the analysis preselection.
As mentioned previously, the three analysis SRs vary in the selection criteria for the  particle.The primary SR, using the VRNN for generic  tagging, is referred to as the anomaly SR.The other two SRs target the benchmark  →  q decay and are thus called the two-prong SRs.Here the jet substructure is used to distinguish reconstruction of the  particle as either a single large- jet (merged SR) or two small- jets (resolved SR).They are not required to be orthogonal to the anomaly SR, but do not overlap with each other.The analysis CRs include the high sideband (HSB) of the Higgs boson candidate mass   , used to generate the DNN reweighting for the background estimation procedure, and CR0, which provides the final background template for the SR.The VRs are defined in the low sideband (LSB) of the Higgs boson candidate mass, and validate the reweighting derived from the HSBs.The full analysis flow is shown in Figure 2, with selection details provided below.

𝑿/𝑯 ambiguity resolution
Since the SRs are constructed by placing requirements on the different properties of the  and  large- jets, an ambiguity must be resolved to determine which of the two  in the event is more likely to be the Higgs boson and thus subject to the Higgs boson selection criteria.This is done by using a classifier based on a neural network (NN) to separate bosons decaying into  b from top-quark and QCD jets [55].The tagger is trained on the large- jet  T and , along with the subjet flavor-tagging score DL1r [56] for up to three track-subjets constructed with a jet- T -dependent radius parameter [57].The tagger version used here includes a reweighting of all training inputs to have the same   or multĳet process ( multĳet ), and these are subsequently combined into the jet-level discriminant    : In Eq. ( 3),  top determines the weight assigned to the top-quark background shape in the final discriminant, and its chosen value of 0.25 is based on signal-to-noise optimization studies.
To perform the large- jet ambiguity resolution,    is computed for both  in the event.The jet with the larger value of    is labeled as the Higgs boson candidate (  ), and the other  is by default the  candidate (  ), thus determining which jet is subject to further  and  selection.This procedure has an accuracy of over 90% in the highly boosted region of the signal grid (  /  < 0.5), decreasing to no less than 75% for less boosted mass points where the boosted  →  b tagging is less efficient.The resulting distribution of    for the  chosen as the Higgs boson candidate, for both data and representative  →   signals at preselection, is shown in Figure 3. Additionally, a preselection of    > −2 is applied to remove events that are determined to be not  →  b-like, thus ensuring the data-driven background estimation focuses on a phase-space region that is close to that of the signal.

𝑿 selection
After the  candidate's  is determined, a region-dependent selection is applied to enhance the anomalousjet content in the anomaly, two-prong merged, and two-prong resolved regions.The anomaly region is defined using the Anomaly Score value  A of the  jet as determined by the VRNN. Figure 4 shows the resulting distribution of  candidate  A values after the analysis preselection is applied, comparing data with three different  →   baseline signal samples as well as the three signals with alternative  decays.The absence of signal information in the construction of the score means that it is not expected to outperform analytic or supervised methods for any single signal model.However, using the VRNN approach to characterize the background distribution provides broad sensitivity to a variety of signals that differ from jets originating from QCD processes.The most notable difference in  A shape is found for dark jets, which have a substructure that is not well-characterized by existing variables.The dependence of the sensitivity on kinematics is also relevant to performance, as the  A distribution for the highly resolved point (  = 5000 GeV,   = 2500 GeV) populates lower, and thus less anomalous, values than the data, making this signal indistinguishable from background in the anomaly detection approach.A selection  A > 0.5 is chosen for the   candidate, and provides modest but comparable discrimination power in data for the disparate signal-jet topologies considered in the full expected  T range of the  candidate jets.This enhances the signal-to-background ratio for the two boosted points, (  = 2000 GeV,   = 300 GeV) and (  = 3400 GeV,   = 110 GeV), by approximately 25% relative to the inclusive selection.
Regions focusing on the benchmark  →  q decay are used to supplement the anomaly detection analysis and interpret the results.The merged region is defined by applying a selection  trk 2 < 1.2, where  trk 2 is the same as  2 but computed using only tracks associated with the jet.The choice of using only tracks in the  2 calculation is motivated by the ability to propagate track-only uncertainties into the final signal efficiency.Detailed comparisons of  trk 2 with the standard  2 variable at analysis preselection have shown  trk 2 and  2 to be highly compatible, ensuring its sensitivity to two-prong signals with respect to multĳet background.Because jets with two-prong substructure have lower values of  trk 2 , this selection enhances the presence of fully merged decays where the  decay products are well-contained by a single .Since the efficiency of substructure variables is mainly driven by the boost of the large- jet and therefore its  T , the optimal cut value of 1.2 was chosen so as to maximize signal-vs-background discrimination across the Due to kinematic constraints driven by the   /  ratio, events that fail the merged selection typically also show poor reconstruction of the  mass by the large- jet.In these events, the large- jet does not sufficiently contain all of the  decay products, leading to an underestimation of the  particle mass.To achieve better sensitivity in this regime, a resolved selection is defined by requiring the  trk 2 value of   to be greater than 1.2, corresponding to a less boosted  particle which is more appropriately reconstructed using two small- jets.A dedicated algorithm matches two small- jets to the  by requiring at least four  in the event, discarding the two with the smallest Δ to the Higgs boson candidate   , and selecting the two remaining small- jets with the highest  T .The mass of the  resonance is then computed using the Higgs boson candidate large- jet and the two small- jets matched to the .Additional requirements of |Δ 1, 2 | < 2.5 and a  T imbalance  bal T < 0.8 are applied to aid accurate resolved  reconstruction; here Δ refers to separation in rapidity and  bal T is defined as the difference of the  particle's small- jet transverse momenta divided by their sum.The resolved region improves the signal efficiency for points with higher   /  (corresponding to less boost for the  candidate decay) by a factor of approximately five relative to the boosted selection.

𝑯 selection
The Higgs boson tagging is performed after the   selection is applied, and sorts the events into the three analysis categories, for which the background is estimated.For all signal regions, a requirement    > 2.44 providing a constant 60% efficiency across jet  T is applied to the Higgs boson candidate   , along with a mass window requirement of 75 <   < 145 GeV.The data-driven background estimation uses events from the HSB of the Higgs boson candidate mass (145 <   < 200 GeV), which are further separated into HSB0 and HSB1 depending on whether the Higgs boson candidate   fails or passes the    selection.Validation is performed in the LSB, where the reconstructed Higgs boson mass must be between 65 and 75 GeV.LSB0 and LSB1 are similarly defined as having Higgs boson candidates that fail or pass the    tagging criterion, respectively.CR0 is defined as the set of events where the  boson is in the SR mass window but fails the    tagging.DNN-based reweighting is applied to CR0 to obtain the final background prediction in the SR.The contamination from signal events is found to be of the order of a few percent in the HSB, and no more than 10% in the LSB, for all mass points in the  →   grid.
In total, 3 SRs and 15 background estimation regions, 5 for each SR, are used in the analysis.A summary of all region definitions is provided in Table 1.

Binning optimization
The signal search in the two-dimensional space of   vs.   applies a variable-width sliding window to the   candidate mass spectrum, dividing the data into a series of overlapping   ranges in which the    distribution is fitted.The chosen binning for   and   is based on the ATLAS detector's mass resolution for a generic narrow-width resonance, given by the fitted width of the Gaussian core of a double-sided Crystal Ball function.Modifications are made to the binning to account for the limited number of data events.In the   spectrum, bins are widened at higher values of   to ensure that at least one event is present in each bin when using LSB1 data.For the   categories, an initial set of mass windows is generated from a linear fit of   resolution as a function of   .The first window is chosen to have a bin center at   = 65 GeV, the lowest generated signal  mass, and its width is chosen to be twice the resolution obtained from the fit.Subsequent windows, increasing in   , are chosen in the same way, where their bin centers are selected to be higher than the previous window's bin center by an amount that is equivalent to half of the previous window's mass resolution.The widths of high-mass windows are expanded symmetrically, based on event counts in LSB1, until at least 10 events are present in each final   bin.In order to avoid having duplicate windows, all resulting windows which share at least one bin edge are replaced by a single window encompassing their union.Finally, to accommodate the highest signal mass point at   = 3000 GeV, the last bin's upper edge is placed at 3200 GeV.

Background estimation
The expected SM background that contributes in the SRs arises mainly from high- T multĳet events, with jets that imitate the  b or  topologies.Simulation of such QCD processes is known to suffer from mismodeling and is computationally expensive.Therefore, in this analysis the background estimation is fully data-driven and uses regions that are orthogonal to the SR because of their Higgs boson jet criteria.
The shape of the expected    distribution in the SR is obtained from data in CR0, and weights are derived that can be applied to HSB0 to reproduce the shape found in HSB1.This procedure is validated by applying the weights to data in LSB0 and comparing the resulting    spectrum with that observed in LSB1 data.The weights are generated inclusively in the X candidate selection, and applied separately to create three background predictions, one for each SR.
The reweighting function is defined as the ratio of the multidimensional probability distribution function (pdf) of data in HSB1 to that of data in HSB0.In this analysis, the statistical procedure of direct importance estimation [58,59] is utilized, where the ratio is estimated directly from data without having to analytically compute each individual PDF.It is implemented via the training of a DNN, where a custom square-root loss function [60] is chosen to produce weights that can accurately reproduce the observed ratio in data.The DNN outputs event-level weights that are assumed to be approximately independent of   , an assumption that can be verified using events in the LSB, and thus can be applied to an untagged region to produce the    shape in the corresponding  →  b-tagged region.
Events are considered for training the DNN if they pass the analysis preselection, satisfy 145 <   < 175 GeV, and additionally have at least two track-jets associated with the Higgs boson candidate .They are modeled as a set of variables, namely the transverse momentum, pseudorapidity, azimuthal angle  and energy of the Higgs boson candidate; the number of tracks associated with the Higgs boson candidate; and the transverse momentum, pseudorapidity,  and mass of the first two track-jets associated with the Higgs boson candidate, ordered in  T .Each variable x is standardized with the transformation  = ( − )/, where  and  are the mean and the standard deviation of the distribution of the x variable.
The DNN is built using a fully connected sequential model from Keras [61] with 3 inner layers, each with 20 neurons and a rectified linear unit activation function.In order to reduce the problem of overfitting during training, 10% of connections among inner layers are randomly severed (called "dropout").The last layer has a single output with a simple linear activation function.The model is trained using the Adam optimizer [62] in Keras, with Tensorflow [63] as its backend.Training is performed using a batch size equal to the full dataset size for 1600 epochs, with early stopping at 100 epochs if the value of the loss calculated on the validation dataset does not decrease for 100 subsequent epochs.
The output weights of the DNN are validated using data in the LSB. Figure 6 shows the impact of the reweighting on distributions of several key analysis variables, using the two-prong merged LSB VR as an example.Three curves are shown for each variable, comparing the LSB0 data, before and after DNN reweighting is applied, with the target data distribution in LSB1.These variables are chosen so as to focus on kinematic variables of the Higgs boson, whose properties are used to distinguish the CR from the SR, and variables involved in the final statistical treatment.Good agreement between the reweighted shape and the true tagged data is observed for each variable, suggesting a robust background model.Since the training is performed inclusively in the -tagging, the same conclusion holds for the anomaly and two-prong resolved LSB regions, which is verified in comparisons of distributions between the reweighted LSB0 and LSB1 in those regions.Also, since the reweighting is applied to    distributions after the X-tagging selection is applied, no extrapolation across  trk 2 or  A is required, and the method shows similar background modeling across all three SRs.Any minor residual differences between predicted background and data in the SRs are covered by the nonclosure systematic uncertainty, described in Section 8.

Systematic uncertainties
Systematic uncertainties affect both the data-driven background estimate and the simulated signal.All background uncertainties are estimated using an    shape that is inclusive in   , and applied to each   category fully correlated among    bins.Background shape variations are derived from three sources.
The first is from training the DNN in an alternative region of 165 <   < 200 GeV to account for the way in which phase-space uncertainties may affect the obtained weights.This region has approximately the same number of events and tagging efficiency as the nominal training region, helping to isolate the way in which changing only the DNN model affects the weights.Upward and downward variations are defined by symmetrizing the shape difference in    between the two different models, yielding an effect of O (1−10)% across the distribution.
Another DNN variation accounts for the finite size of the training sample and the random initialization of the weights.It is estimated with a bootstrap procedure [64,65] where a set of 100 bootstrap networks are trained.For each training, the training dataset is varied by resampling it with replacement, assigning to each event a new weight that comes from a Poisson variable with a mean of one.Two additional templates are formed to create a symmetric error band by taking the median weight for each event and adding or subtracting half of the interquartile range.This corresponds to a O (1)% effect across   .
Lastly, a nonclosure uncertainty is included to cover modeling discrepancies that may arise when weights derived from the NN training in the HSB are extrapolated to the LSB, and subsequently to the SR.It is defined by the symmetrized shape difference between the data and predicted background in the LSB.To avoid being sensitive to statistical fluctuations in the LSB, smoothing is applied to the variation, where it is rebinned to reduce the relative statistical uncertainty.The nonclosure uncertainty is negligible for low    and rises to O (10)% in the    tails.
Normalization and shape uncertainties are applied to the simulated signals.An uncertainty of 1.7% is applied to the luminosity measured with the LUCID-2 detector [66].The uncertainty from the trigger selection is negligible, and thus not included, since the requirement of    > 1.3 TeV ensures that the trigger is fully efficient.
Scale factors (SFs) computed to match the  →  b tagger efficiency between data and simulation are applied to the signal template, and their uncertainties are similarly propagated to the signal normalization.These SFs are computed using the methodology described in Ref. [67], with the substitution of an updated    that includes the -reweighting of inputs as described in Section 6.They are binned in large--jet  T , where the SF for the highest  T bin is extrapolated to cover the upper end of the  T region probed by the analysis selection.Agreement between data and MC simulation for these SFs is verified with  →  b events in calibration studies.The SFs are bounded by approximately 1.1 and 1.4 across all  T bins, with uncertainties in the range of approximately ±(0.3-0.5).
Instrumental systematic uncertainties arise from uncertainty in the jet  T scale and resolution, for both the large- and small- jets, and affect the shape of the    distribution.Uncertainty in the large--jet  T scale is an important effect in the search for resonant structures in the presence of rapidly falling background spectra, as it can shift the peak of the resonance.It is evaluated using tracking-to-calorimetric  T double ratios between data and simulation, where any observed differences are assigned as baseline systematic uncertainties [68].Even though the analysis relies on jets built from TCCs, the total  T of the jet is still solely derived from calorimeter information, keeping it independent of the track-based jet  T .Past analyses have studied the possible impact of calorimeter vs. track-based  T by cross-calibrating between per-jet TCC and calorimeter  T and found it to be negligible [30].The total jet- T -scale uncertainty varies with  T and  and is typically around ±5%. Uncertainties from the reconstruction and modeling of tracks are also taken into account; these cover track reconstruction efficiency, impact parameter resolution, tracking in dense environments, fake-track rates, and sagitta biases.The impact of the large--jet  T resolution uncertainty is evaluated event-by-event by rerunning the analysis with an additional absolute 2% Gaussian smearing applied to the input jets'  T to degrade the nominal resolution.Small--jet  T scale and resolution uncertainties are similarly estimated by comparing data with simulation after applying in situ corrections [69].
Several sources of theoretical uncertainty affecting the signal models were considered.Uncertainties in the matrix element calculation are evaluated by varying the strong coupling constant ( s ), the renormalization scale ( r ), and the factorization scale ( f ).The effects of uncertainties in the parton distribution functions are evaluated by comparing the    distributions obtained when using various alternative PDF sets and taking an envelope of these distributions, as prescribed by the PDF4LHC group [70].Generator-level variations of the A14 tune's parameter values are used to cover the initial-and final-state radiation (ISR and FSR) and multiple parton interaction (MPI) uncertainties.An overall conservative 3% normalization uncertainty is applied to all signals as a result of ISR/FSR/MPI modeling effects.

Statistical analysis
The statistical framework in the analysis is used to perform hypothesis tests in the SRs, testing the compatibility of the data with both the background-only and signal-plus-background hypotheses.The observable that is fitted is the   distribution of the data in the SR.This fit is repeated several times in overlapping bins of the X candidate mass.
The parameter of interest in the statistical analysis is the signal strength , defined as a scale factor multiplying the nominal yield predicted by a 1 pb signal cross section so as to match the observed number of signal events.The background-only hypothesis corresponds to  = 0.The normalization of the data-driven background estimate is allowed to float, with each normalization factor being fitted independently because the   categories overlap.A test statistic based on the profile likelihood ratio calculated in the lowest-order asymptotic approximation is used to test the models corresponding to the signal grid.Systematic uncertainties are included in the fit as nuisance parameters (NPs) with Gaussian constraints.Both the signal strength and all signal systematic uncertainty NPs are correlated across the merged and resolved regions.The significance of an observed excess of data over the background prediction is quantified by the local -value, which is the probability that the background-only model produces an excess at least as large as the one observed.
BumpHunter [71] is used to find excesses in both the anomaly and two-prong SRs, with an algorithm that incorporates only the statistical uncertainty of the data and does not depend on a specific signal shape.It outputs a -value that provides a goodness-of-fit metric, along with an interval of the invariant mass corresponding to the largest deviation of the data from the background prediction.It is thus used in all signal regions to search for significant deviations of the data from the background   distribution.Dedicated studies indicate the fits are dominated by statistical uncertainties, further motivating the discounting of systematic uncertainties in the BumpHunter fits.In the anomaly SR, no subsequent fits are performed using the  →   signal model, as this analysis is designed to keep signal-model dependence minimal and the two-prong regions are expected to have higher signal sensitivity across the  →   grid.However, given the absence of a significant excess in the two-prong regions, signal-plus-background fits are performed for each  →   signal model, and 95% confidence level (CL) upper limits on the signal cross section are set using a modified frequentist method (CL s ) [72].
The background estimation and statistical treatment are validated by comparison with data in the signaldepleted LSB VR for each of the three analysis categories.BumpHunter -values from all   -bin fits in the VR of the anomaly region approximate a flat distribution between 0 and 1, indicating good background modeling with no systematic biases across the phase space.Figure 7 shows the post-fit background prediction compared with the data passing the anomaly, merged, and resolved selections, in an example   window between 284.5 and 322.5 GeV where the whole kinematic phase space is well-populated.Good modeling is observed, and no significant pulls or constraints are observed in the background NPs.In the two-prong regions, checks are made for spurious signals in the LSB through a signal-plus-background fit using each generated signal model.These checks indicate that no significant spurious signal is present beyond those likely to be produced by statistical fluctuations, with no systematic trend across the expected phase space.

Results
Results of background-only fits of the   distribution across all   categories in the anomaly SR show the data to be compatible with the expected backgrounds given their uncertainties.Figure 8 shows the distribution of -values across   and   bins, where the background is the result of a background-only fit performed with all statistical and systematic uncertainties, and all uncertainties are included in the -value calculation.The distribution of observed -values is statistically compatible with the expectation, with a -squared value per degree of freedom of 1.32 between the distributions.
The largest observed excess in the anomaly SR is in the   window [75.5, 95.5] GeV.In this window, the greatest single-bin excess is found for   between 3608 and 3805 GeV.The BumpHunter interval with the greatest significance covers the   range between bin edges of 3424 and 3805 GeV.The   distribution corresponding to this   window is shown in Figure 9, along with the post-fit expected background.Studies of the individual jet mass distributions in this   category do not reveal any excesses in data that are consistent with a resonant particle decay.Given the number of individual search regions in this analysis, the impact of the trials factor is significant.A calculation is made to determine the global significance of the greatest single-bin deviation, accounting for all the overlapping   bins.First, the overlapping bin edges are used to define exclusive, nonoverlapping bins in   , and an integer is drawn from a Poisson distribution with a mean equal to the background expectation in each exclusive (  ,   ) bin.Next, this yield is summed across exclusive bins to create a toy estimate for each overlapping bin in which the -value is computed.Finally, this procedure is repeated  times, where  is the number of events, inclusively across all exclusive, nonoverlapping bins.This calculation yields a global significance of 1.43 for the excess in the [3608,3805] GeV bin of   .
Results for the two-prong SRs are similarly derived by performing background-only fits and scanning with BumpHunter for incompatibility with data.No significant deviations of data from the predicted     background are observed beyond expected statistical fluctuations, in either the merged or resolved SR.An example   distribution in both the merged and resolved SRs for the   bin [284.5, 322.5]GeV is shown in Figure 10, along with the background estimate that is determined from a background-only fit accounting for all uncertainties.Figure 11 shows a summary of the per-bin -values in each   bin for selected   bins that are centered on key X candidate mass hypotheses, namely at the , , and H boson masses.
Given the absence of significant excesses in the data, signal-plus-background fits are performed to determine the 95% CL upper limit on the cross section of the  →   process.A summary of the expected and observed limits in the 2D grid of  →   signals is given in Figure 12, combining results from the merged and resolved regions.A bilinear interpolation procedure is applied to provide results between fully simulated signal points.The analysis is most sensitive in the very boosted regime, where the  mass is approximately an order of magnitude larger than the  mass.Sensitivity is lowest in the highly resolved regime, where the required large-  reconstruction of the Higgs boson sculpts the signal efficiency to favor high-momentum  particles.The observed limits range from cross sections of 0.341 fb for the signal point (  = 5000 GeV,   = 600 GeV), to 1.22 pb for the signal point (  = 2500 GeV,   = 2000 GeV).
The data in the anomaly and two-prong SRs can be used to provide a benchmark comparison of sensitivity across the set of large- jet topologies considered for the X decay, thereby providing a metric for assessing the level of signal-model-dependence in both regions.The 95% CL upper limits on the production cross section of several benchmark signals are calculated for all three signal region selections, by injecting signal into the data until the BumpHunter -value exceeds a significance of 2.Seven signal points are considered in this study, including three highly boosted  →   points, one resolved  →   point, and the three alternative jet topologies.Because the systematic uncertainties in the signal efficiency of the Anomaly Score are not assessed, this comparison is performed using only statistical uncertainties and a post-fit background estimate in the limit calculation.in pb in the two-dimensional space of   versus   , obtained from a simultaneous fit of both the merged and resolved two-prong signal regions with all statistical and systematic uncertainties.A bilinear interpolation procedure is applied to provide results between fully simulated signal points.The observed limits range from 0.34 fb for the signal point (  = 5000 GeV,   = 600 GeV) to 1.22 pb for the signal point (  = 2500 GeV,   = 2000 GeV).The boundaries of the limit are defined by the simulated signal grid.
Since the merged region uses  trk 2 and thus explicitly tags on the two-prong substructure of the X candidate's large- jet in the generated  →   grid, it is possible that this region will outperform the fully unsupervised approach for the  →   signals.Figure 13 provides a summary of the limits obtained from the signal injection tests, comparing all three signal region selections for the seven benchmark signal points.For points where the X particle is highly boosted and thus the Anomaly Score is most sensitive, the upper limit on the cross section is approximately the same across the merged and anomaly SRs.The  trk 2 selection for this analysis is a general one; further optimization of the  trk 2 selection would be expected to provide better performance than the Anomaly Score across two-prong simulated signals.However, the signal-model-independent aspect of the anomaly detection used in the anomaly SR is more evident through improved limits on the alternative substructure signals.Notably, for the dark jets the cross-section upper limit is nearly an order of magnitude lower than that provided by the two-prong regions, which underlines the particular strength of the model-independent approach for signals that are challenging to characterize with existing high-level variables.While these results cover regions of phase space that have not been studied directly by other searches, some analysis selections overlap with those of other recent ATLAS dĳet resonance searches.The   bin of [75.5, 95.5] GeV would be sensitive to a   resonance's hadronic final state, which is covered by a dedicated analysis using the same dataset [73].The present approach differs in the tagging techniques used for both the vector boson and Higgs boson, but provides a similar 95% CL upper limit on the production cross section of a 3 TeV resonance.Due to its generality, the anomaly SR is expected to be sensitive to the same signatures as the weakly supervised dĳet resonance search [13], but a direct comparison is not provided due to the assumptions made here about the mass and decay of the Higgs boson candidates.

Conclusion
A search is performed for a heavy new boson Y decaying into a new particle X and a Standard Model Higgs boson in 139 fb −1 of 13 TeV   collision data collected by the ATLAS detector during LHC Run 2. The analysis focuses on a fully hadronic final state, where the X particle and H boson are boosted such that their daughter particles are collimated.In the first application of fully unsupervised machine learning to an ATLAS search, a VRNN is trained on jets in data to define an anomaly detection SR, which selects the X particle based solely on its substructural incompatibility with background jets.Two supplementary SRs are designed to separately reconstruct merged and resolved decays of a nominal two-prong X benchmark signal.Sensitivity over the dominant multĳet background is enhanced by additional machine-learning applications, namely a NN-based  →  b tagger and a DNN-based reweighting to ensure good modeling.No significant deviations of the data from the predicted background are observed.The largest excess is found in the anomaly SR with a global significance of 1.43 when considering all   and   bins, and is not found to be compatible with the expected signal shape.Results are interpreted as 95% confidence-level upper limits on the cross section (   →  →   →  q b), across the two-dimensional space where   is between 1.5 and 6 TeV and   is between 65 and 3000 GeV.The most stringent upper limit of 0.341 fb is obtained in the merged SR for the signal point (  = 5000 GeV,   = 600 GeV), while the weakest upper limit of 1.22 pb is obtained for the highly resolved point (  = 2500 GeV,   = 2000 GeV).

Figure 1 :
Figure 1: Feynman diagram of the target signal process, where the Y is produced in the initial   collision and decays to a fully hadronic final state via a SM Higgs boson  →  b and a new particle X .The only assumption applied to the X decay is that it yields a hadronic final state.
T and  distributions, to minimize biasing of the tagger towards high- T or central jets respectively.The outputs of the NN are three classification scores corresponding to the probability for the jet to have originated from a Higgs boson ( Higgs ), top quark ( top ), p T -leading jets.666 • X-tagging: selection on D2 trk (exclusion regions, boosted + resolved (+ small jets | Y |<2.5 AND 667 p T balance<0.8))OR selection on anomaly score (discovery regions) 668 • H-tagging: based on D Hbb of Higgs candidate and its mass value, the regions in Figure 26 are 669 defined.670 At the end 3 signal regions (merged, resolved, and anomaly score) and 15 control/validation regions (5 671 non-SR regions for each SR) are defined, shown graphically in Figure 27.672 • LSB1 (low side band 1) and LSB0 (low side band 0) are the D Hbb -tagged and -untagged regions 673 (passing or not the 60% WP) in the low side band (60 GeV < m H < 75 GeV) 674 • SR (signal region) and CR0 (control region 0) are the D Hbb -tagged and -untagged regions (passing 675 or not the 60% WP) in the Higgs mass window (75 GeV < m H < 145 GeV) 676 • HSB1 (high side band 1) and HSB0 (high side band 0) are the D Hbb -tagged and -untagged regions 677 (passing or not the 60% WP) in the high side band (145 GeV < m H < 200 GeV) 678 An additional cut D Hbb > (-2) is applied to all regions, purely for skimming of unwanted events, since 679 signal is expected to have D Hbb > chosen WP.680The SR is used to perform the signal search, while the high sidebands and the low sidebands are used to 681 develop and validate the background model respectively.The -2.0 < D Hbb < 2.46 selection for untagged 682 regions allows to collect su cient statistics to perform background estimation.As a result, the background

Figure 26 :Figure 2 :
Figure 26: Scheme of analysis regions based on cuts on DH bb and mH .Here, the HSBs are used as the control regions, and the LSBs as the validation region.

Figure 3 :
Figure 3: Distributions of the   candidate    score in data after preselection requirements are applied.Also shown are three  →   simulated signals, labeled by the masses of the  and  particles.Requiring    > 2.44 defines a working point that is 60% efficient for the selection of the boosted  →  b topology across the full  T range.All distributions are normalized to unity.

Figure 4 :
Figure 4: Distributions of the   candidate Anomaly Score ( A ) in data after preselection requirements are applied.Also shown are three  →   simulated signals (left), labeled by the masses of the  and  particles, and the three additional signals with alternative  decay hypotheses, namely heavy-flavor, three-prong, and dark jet (right).A selection  A > 0.5 applied to   defines the anomaly analysis category.All distributions are normalized to unity.

Figure 5 :
Figure 5: Distributions of   candidate  trk 2 in data after preselection requirements are applied.Also shown are three  →   simulated signals, labeled by the masses of the  and  particles.A value of  trk 2 = 1.2 therefore separates the analysis events into merged (< 1.2) and resolved (> 1.2) two-prong categories.All distributions are normalized to unity.

Figure 6 :
Figure 6: Distribution of (a)  T and (b)  for the H candidate , along with (c) the mass of   and (d)    in the merged LSB validation region, overlaying data from LSB1 (solid black points) with the data in LSB0 shown before (orange histogram) and after (red circles) reweighting.The ratio of the LSB1 data to both the LSB0 data (orange) and the reweighted LSB0 data (red) is shown in the lower panel.Error bars indicate statistical uncertainties.

Figure 7 :
Figure 7: Reconstructed   distributions of the background, as determined by a background-only fit, and data in the LSB1 VR, for the anomaly, two-prong merged, and two-prong resolved selections, in the   bin [284.5, 322.5]GeV.The ratio of the observed data to the background is shown in the lower panel.The uncertainty band includes both the statistical and systematic effects.

Figure 8 :
Figure 8: The distribution of observed -values across all   and   bins in the anomaly signal region, comparing data with the background estimates generated by a background-only fit, displayed in the two-dimensional (  ,   ) grid.The -value calculation is performed at the center of each   bin, and all statistical and background systematic uncertainties are considered.The lowest observed -value corresponds to the bin with   within [3608, 3805] GeV and   within [75.5, 95.5] GeV.

Figure 9 :
Figure9: The   distribution associated with the   bin[75.5, 95.5]  GeV in the anomaly SR, where the background is determined by a background-only fit to data with both the statistical and systematic uncertainties included.The -value and the lower panel of per-bin significances are computed with statistical uncertainties only.The single   bin with the most significant excess is[3608, 3805]  GeV.Incorporating the trials factor for all   and overlapping   bins in the search, this excess has a global significance of 1.43.

Figure 10 :
Figure 10: Reconstructed   distributions of background and data in the SR, for the merged (left) and resolved (right) selections, in the   bin [284.5, 322.5]GeV.The background is determined by a background-only fit to the data with all statistical and systematic uncertainties included.The ratio of the observed data to the background is shown in the lower panel.The uncertainty band includes both the statistical and systematic effects.

Figure 11 :
Figure11: The -value per   bin for both two-prong SRs, calculated using all systematic and statistical uncertainties of the background estimate.Two   bins are shown,[75.5,95.5] and [113.0,137.0]GeV, which corresponds to a window containing the / and Higgs boson mass respectively.Events thus fall in (a) a merged / window, (b) a merged Higgs window, (c) a resolved / window, and (d) a resolved Higgs window.The background is determined by a background-only fit to the data with all statistical and systematic uncertainties included.In both   windows, the -value has a high approximately constant value for the high  -mass region of the resolved SR (bottom), as this phase-space region is far more likely to produce a highly boosted   that falls in the merged SR selection.

Figure 12 :
Figure12: The expected (left) and observed (right) 95% CL limits on the cross section (   →  →   →  q b) in pb in the two-dimensional space of   versus   , obtained from a simultaneous fit of both the merged and resolved two-prong signal regions with all statistical and systematic uncertainties.A bilinear interpolation procedure is applied to provide results between fully simulated signal points.The observed limits range from 0.34 fb for the signal point (  = 5000 GeV,   = 600 GeV) to 1.22 pb for the signal point (  = 2500 GeV,   = 2000 GeV).The boundaries of the limit are defined by the simulated signal grid.

Figure 13 :
Figure13: The 95% CL upper limit on the cross section for seven benchmark signal processes, comparing the anomaly, two-prong merged, and two-prong resolved signal region selections.The cross section is obtained by injecting signal into the observed data until 2 sensitivity is achieved as determined by the BumpHunter interval -value.The background estimate is the result of a fit with all background uncertainties considered, while the BumpHunter algorithm incorporates only statistical uncertainties.The uncertainty band reflects the standard deviation of obtained cross section values for 100 trials.All signals are labeled by the masses of the BSM particles except the  →  signals, where   = 3 TeV,   = 200 GeV, and   = 400 GeV.Compared to the  trk 2 -defined regions, the anomaly SR performs competitively for all signals considered, and considerably better for the dark jet signal, indicating a high degree of signal-model-agnostic sensitivity.The red upward-pointing arrow for  →  →  indicates that the limit obtained from the two-pronged (merged) SR lies above the upper edge of the plot.

Table 1 :
Preselection requirements, and optimized requirements defining the SRs and background estimation regions.