Search for long-lived particles using displaced vertices and missing transverse momentum in proton-proton collisions at √ s = 13 TeV

A search for the production of long-lived particles in proton-proton collisions at a center-of-mass energy of 13 TeV at the CERN LHC is presented. The search is based on data collected by the CMS experiment in 2016–2018, corresponding to a total integrated luminosity of 137 fb − 1 . This search is designed to be sensitive to long-lived particles with mean proper decay lengths between 0.1 and 1000 mm, whose decay products produce a final state with at least one displaced vertex and missing transverse momentum. A machine learning algorithm, which improves the background rejection power by more than an order of magnitude, is applied to improve the sensitivity. The observation is consistent with the standard model background prediction, and the results are used to constrain split supersymmetry (SUSY) and gauge-mediated SUSY breaking models with different gluino mean proper decay lengths and masses. This search is the first CMS search that shows sensitivity to hadronically decaying long-lived particles from signals with mass differences between the gluino and neutralino below 100 GeV. It sets the most stringent limits to date for split-SUSY models and gauge-mediated SUSY breaking models with gluino proper decay length less than 6 mm.

The search presented here is sensitive to LLPs whose decays include multiple charged particles and one weakly interacting neutral particle.To target this signature, the search is designed to be sensitive to topologies that include missing transverse momentum (p miss T ) and at least one displaced vertex.A displaced vertex is reconstructed by identifying charged-particle trajectories converging at a point in space that is at a measurable distance from the proton-proton (pp) interaction point.Reconstructed vertices are required to be within a transverse displacement of approximately 2 cm, which ensures they are composed of well-measured trajectories and eliminates the possibility of background vertices caused by SM particle interactions with the CMS beam pipe.
The ATLAS and CMS Collaborations have previously performed searches for LLPs in signatures with displaced vertices [22][23][24][25][26].Those searches target LLPs that either decay at least 1 mm away from the interaction point, or have most of their energy decay into SM particles.The search presented in this paper is designed to cover the gaps left by previous searches, which include a wider range of LLP mean proper decay lengths (cτ), from 0.1 to 1000 mm, and LLPs in "compressed" scenarios, where most of the LLP energy is carried away by undetected weakly interacting particles.This search includes several improvements over the previous CMS displaced vertex search [24].One such improvement is the inclusion of events with as few as one displaced vertex.This allows sensitivity to new physics models with only one LLP in the final state.It also enhances sensitivity to beyond the SM scenarios in which the reconstruction of displaced vertices is challenging, such as models involving lower energy carried by charged particles in the final states or cases where only one of the LLP decays occurs within the detector.Although requiring only one displaced vertex increases the signal efficiency, it also includes more background events.To further discriminate signal and background events, the "interaction network" (IN) [27,28], a machine learning (ML) algorithm, is introduced to improve the sensitivity to the target signature.An alternative selection without using ML was studied and found to have a factor of 8 more background events for smaller signal efficiency.In addition, this search includes events with an overall total transverse momentum (p T ) imbalance, making it sensitive to final states with a significantly smaller scalar sum of jet p T .The search uses events collected in pp collisions at the LHC in 2016-2018, corresponding to an integrated luminosity of 137 fb −1 .This analysis uses split SUSY and GMSB SUSY as benchmark signal models, as shown in Fig. 1.The split-SUSY model features pair-produced, long-lived gluinos ( g) that each decay into two quarks (q) and a neutralino ( χ).In this model, the SUSY breaking scale (m SUSY ) is assumed to be m SUSY ≫ 10 6 TeV [2], and all scalar masses are set to that scale, except for a single, finetuned, Higgs boson mass.The g decays through a high-mass, virtual squark resulting in a long g lifetime.In the GMSB SUSY model, g are pair-produced and decay to a gluon and a nearly massless gravitino ( G).The decay is suppressed by m SUSY , and thus the g is long-lived.The lifetime of the gluino in both models is given by τ ≃ 8 m SUSY 10 6 TeV 4 1 TeV where m g corresponds to the g mass [2,11].Previously, the CMS Collaboration performed a search for the split-SUSY signal model [29]; however, the displaced signature was not targeted and the resulting large number of background events decreased the sensitivity.In the split-SUSY model, a pair of long-lived gluinos is produced, and each decays to two quarks and one neutralino.In the GMSB SUSY model, a pair of long-lived gluinos is produced, and each decays to a gluon and a gravitino.
The paper is organized as follows.Section 2 presents an overview of the CMS detector.Section 3 summarizes the data and simulation samples used in this search.We describe the displaced vertex reconstruction algorithm in Section 4, followed by the discussion of the ML algorithm in Section 5.The event selection criteria and the determination of vertex reconstruction and ML tagging efficiency are summarized in Section 6.In Section 7, we explain the background estimation procedure based on control samples in data.Section 8 discusses systematic uncertainties in signal efficiency and background estimation.Section 9 presents the results and statistical interpretation.Finally, the paper is summarized in Section 10.Tabulated results are provided in the HEPData record for this analysis [30].

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m inner diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections.Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors.A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [31].
The silicon tracker used in 2016 measured charged particles within the range |η| < 2.5.For nonisolated particles within the ranges of 1 < p T < 10 GeV and |η| < 1.4, the track resolutions were typically 1.5% in p T and 25-90 µm in the transverse impact parameter (d xy ) [32].At the start of 2017, a new pixel detector was installed [33]; the upgraded tracker measured particles up to |η| < 3.0 with typical resolutions of 1.5% in p T and 20-75 µm in d xy [34] for nonisolated particles of 1 < p T < 10 GeV.
A particle-flow (PF) algorithm [35] aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector.The energy of photons is obtained from the ECAL measurement.The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track.The energy of muons is obtained from the curvature of the corresponding track.The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers.Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
The primary vertex is taken to be the vertex corresponding to the hardest scattering in the event, evaluated using tracking information alone, as described in Section 9.4.1 of Ref. [36].
For each event, hadronic jets are clustered from PF candidates using the infrared-and collinearsafe anti-k T algorithm [37,38] with a distance parameter of 0.4.Jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5 to 10% of the true momentum over the entire p T spectrum and detector acceptance.Additional pp interactions within the same or nearby bunch crossings (referred to as pileup) can contribute additional tracks and calorimetric energy depositions to the jet momentum.To mitigate this effect, charged particles identified to be originating from pileup vertices are discarded and an offset correction is applied to correct for remaining contributions.Jet energy corrections are derived from simulation to bring the measured response of jets to that of particle level jets on average.In situ measurements of the momentum balance in dijet, photon+jet, Z+jet, and multijet events are used to account for any residual differences in the jet energy scale between data and simulation [39].The jet energy resolution amounts typically to 15-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV [39].Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures [40].
The ⃗ p miss T is computed as the negative vector p T sum of all the PF candidates except muons in an event, and its magnitude is denoted as p miss T [41].The ⃗ p miss T is modified to account for corrections to the energy scale of the reconstructed jets in the event.Anomalous events with high p miss T can be due to a variety of reconstruction failures, detector malfunctions, or noncollision backgrounds.Such events are rejected by event filters that are designed to identify more than 85-90% of the spurious high-p miss T events with a mistagging rate less than 0.1% [41].
Events of interest are selected using a two-tiered trigger system.The first level, referred to as the level-1 (L1) trigger, is composed of custom hardware processors and uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of about 4 µs [42].The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage [43].

Event samples and preselection
Events in both data and simulation are selected using a trigger requiring p miss T > 120 GeV.Events are further required to pass a set of quality filters [41] to reduce the possibility of mismeasuring p miss T and to have a well-reconstructed primary vertex.In addition, an offline requirement of p miss T > 200 GeV is applied to ensure consistent trigger efficiencies in data and simulation.These selections are referred to as event preselection criteria.Since the CMS detector operated under different conditions during successive data-taking periods, events collected in 2016, 2017, and 2018 are analyzed separately.
Signal events are simulated using PYTHIA 8.240 [44] with the CP2 tune [45] and the NNPDF3.1 leading-order (LO) [46] set of parton distribution functions (PDFs).The gluino production cross sections are then normalized to next-to-next-to-LO (NNLO) in quantum chromodynamics (QCD) and next-to-next-to-leading-logarithm (NNLL) soft-gluon resummation precision [47].In order to account for possible kinematic effects of bound state formation, production of bound states that include a gluino is modeled with PYTHIA.The probability of forming a bound state of gluinos alone is set to the default value of 0.1.Since the search targets only displaced vertices within the beam pipe, material interactions with LLPs and their potential bound states are not considered.The split-SUSY samples are generated for various m g hypotheses in the range from 1200 to 2600 GeV, with the mass splitting between the g and χ ranging from 20 to 800 GeV and the gluino cτ ranging from 0.1 to 1000 mm.The GMSB SUSY samples are generated with m g ranging from 1800 to 2800 GeV and the gluino cτ ranging from 0.1 to 1000 mm.
The dominant background processes include jets produced through the strong interaction, referred to as QCD events, and tt.Other contributions arise from W and Z boson production in association with jets (W/Z + jets), single top quark production, and diboson production (WW, WZ, and ZZ).The QCD and W/Z + jets events are generated using MADGRAPH5 aMC@NLO 2.6.5 [48] at LO in QCD, with the MLM [49] prescription of matching jets from matrix element calculations to those from parton showers.The tt events are generated using MAD-GRAPH5 aMC@NLO 2.6.1 [48] at next-to-LO (NLO), with the FxFx [50] prescription of jet matching.Single top quark production is generated using POWHEG [51][52][53][54][55] (version 2.0 for t-channel and 1.0 for tW-channel production) at NLO in QCD and the diboson processes are generated using PYTHIA 8.240.The NNPDF3.1 NNLO [56] PDF set is used for all background processes.The background event generators are interfaced with PYTHIA to model the parton showering and fragmentation.The PYTHIA parameters affecting the description of the underlying event are set to the CP5 tune [45].
All generated events are processed through a simulation of the CMS detector based on GEANT4 [57], and are reconstructed with the same algorithms as used for data.Simulated minimum bias events are superimposed on the hard interaction to describe the effect of pileup, and samples are weighted to match the pileup distribution observed in data.

Vertex reconstruction
A detailed description of the displaced vertex reconstruction algorithm can be found in Ref. [24].The displaced vertices are constructed from tracks.A set of quality criteria is applied to remove tracks with large reconstruction uncertainty, and a displacement criterion is applied to reduce the contribution of tracks from prompt SM processes.The quality requirements include p T > 1 GeV, at least one associated detector hit measured by the innermost layer of the pixel detector, at least two associated detector hits measured by the pixel detector, and at least six associated detector hits measured by the silicon strip detector.The displacement criterion used is based on d xy measured with respect to the beam axis and its uncertainty (σ d xy ).In this search, the tracks selected to construct vertices are required to have |d xy /σ Each pair of selected tracks is used to construct a vertex using the Kalman filtering approach [58][59][60], and the vertex is considered valid if its χ 2 per degree of freedom is less than five.These reconstructed vertices then undergo an iterative merging procedure.This procedure utilizes the significance, defined as the ratio of a quantity and its uncertainty, of the 3D distance between vertices (d vv and σ d vv ), |d vv /σ d vv |, and the significance of the 3D impact parameter between tracks and vertices (d tv and σ d tv ), |d tv /σ d tv |.First, for each pair of vertices that share at least one track and have |d vv /σ d vv | < 4, a new vertex is fit by the Kalman filtering method using all tracks associated with the pair of vertices.If the resulting vertex satisfies the χ 2 requirement, the pair of vertices is replaced by a single merged vertex.If not, the vertices are not merged and the shared track is assigned to one or neither of the two vertices according to the following arbitration criteria.If |d tv /σ d tv | formed from either of the vertices is greater than five, the track is dropped from that vertex; if |d tv /σ d tv | formed from both vertices is less than 1.5, the track is assigned to the vertex with the most tracks; otherwise, the track is assigned to the vertex that has the smaller |d tv /σ d tv |.The resulting vertices then replace the original vertices if they meet the χ 2 requirement.This process is repeated until no vertices share tracks.
The final step of the vertex reconstruction aims to suppress ones in which tracks from different primary vertices coincidentally overlap.For each displaced vertex, we repeat the vertex fit with one track removed at a time and calculate the resulting change along the beam axis.If the vertex position along the beam axis shifts by more than 50 µm, which was chosen to maximize signal significance in Ref. [24], the track is removed from the vertex, and the procedure is iteratively repeated with the remaining tracks.The procedure stops if there are only two tracks in the vertex.All resulting vertices are required to satisfy the χ 2 requirement.
A set of selection criteria, which is similar to that of Ref. [24], is applied to the vertices in order to suppress vertices that arise from SM processes.First, the number of tracks in each vertex, n track , must be ≥3.To suppress vertices from material interactions, the transverse distance from the center of the beam pipe to the vertex must be less than the radius of the beam pipe, which is approximately 2 cm.Finally, the uncertainty in the transverse distance between the beam axis and the vertex (d BV ), σ d BV , must be <25 µm.This criterion helps to reject vertices from decays of Lorentz-boosted particles, such as B mesons, which tend to have larger σ d BV .
This search specifically targets events with at least one displaced vertex.The value of n track serves as a discriminant between signal and background events.Vertices with n track of at least five are considered as signal vertices, while vertices with n track of three or four are used as control samples for background estimation.In cases where there are multiple displaced vertices, the one with the highest number of tracks is selected for signal extraction.

Machine learning algorithm
The vertex reconstruction and selection largely suppress backgrounds from SM LLPs, such as B and K mesons.The remaining background events are dominated by tt and QCD events, with background vertices originating from unrelated tracks that coincidentally overlap with each other.To further reduce such background vertices, the IN algorithm is used in this analysis.This section introduces the architecture, training and testing, and performance evaluation of the IN.

Interaction network description
An IN is a type of graph neural network [61] that takes graphs, a data structure that is composed of nodes and edges, as input.Nodes can be used to represent objects and edges can be used to represent relations between different nodes.With all of these features, INs are capable of calculating the interactions between different objects in a multibody system and thus predict the final states of all physical objects in the system.We apply an IN to the analysis of CMS events by treating each collision as a multibody interaction.The unique topology of LLP decays produces relationships between tracks that an IN can be trained to identify.In this analysis, we train an IN to calculate the relationships between tracks and learn that tracks resulting from LLP decays originate from displaced spatial points.As a result, the IN can output a discriminant value for assessing whether a given event contains LLP decays predicted in scenarios beyond the SM.
Given that variables associated with the vertex reconstruction are very powerful at discriminating signal and background, including those variables in the IN does not improve the discrimination of background events as the IN will mostly rely on the reconstructed vertices.To enable the IN to exploit as much additional information as possible, the IN is provided tracking information but no information about the reconstructed vertices themselves.This approach allows the IN to make use of the topologies of tracks in signal and background events without the help of the existing vertex reconstruction algorithm.The output of the IN is used together with variables from reconstructed vertices to improve the search sensitivity, as described in Section 6.
Since the IN is computationally intensive, for each event, only the first 50 p T -ordered tracks are used as input.Tracks are required to satisfy the quality criteria but not the displacement criterion described in Section 4, a choice that provides the IN the opportunity to consider additional tracks from LLP decays whose trajectories coincide with the beam axis.For the split-SUSY signal model, the average number of tracks used by the IN that are assigned to reconstructed signal vertices typically ranges from 4 to 10, depending on mass splitting and gluino cτ.In the IN, the input tracks are implemented as nodes in the graph while edges are composed of relations between any pairs of tracks.The tracking information used in the IN includes p T , η, the azimuthal angle (ϕ), d xy and its significance, and the longitudinal impact parameter d z and its significance.
The IN is implemented using standard deep neural network building blocks, such as multilayer perceptrons (MLPs) [62] and matrix operations including addition, subtraction, multiplication, and concatenation.The input is constructed from several matrices: the variable matrix O, which contains track information; the send relation matrix R s , which represents the interaction a track sends to other tracks; and the receive relation matrix R r , which represents the interaction a track receives from other tracks.The interaction described in the IN is an abstract concept that could include any relationship between a pair of tracks.
Both R r and R s are structured such that each row represents a single track, and each column corresponds to the various interactions that tracks can send or receive.We want the IN to consider all possible relationships between tracks, so each element of R r is set to zero or one such that every track can receive interactions from every track except itself, and each element of R s is similarly set to zero or one such that every track can send interactions to every track except itself.
A diagram of the architecture of the IN is shown in Fig. 2. The IN calculates the state after tracks receive and send relations with other tracks by OR r and OR s , and combines these calculated elements together as a single matrix by OR r − OR s .The calculated matrix is passed into a fivelayer MLP, labeled ϕ R , with 50 nodes per layer.The matrix E, which is the output of ϕ R , is interpreted as the effect of all relations having acted on each track.The effect on every track is represented as an array of length 20 in matrix E. The effect matrix E is multiplied with the transpose of matrix R r to calculate the effect received by objects.The result is combined with the original variable matrix O to form a new matrix C, which contains the original information together with the effects having acted on tracks.The matrix C is processed by another MLP with two parts.The first part, labeled as ϕ O , has two layers with 50 nodes in each layer and outputs matrix P. The second part, labeled as ϕ output , has three layers with 100 nodes in each layer.It takes matrix P as input and outputs a vector that contains a single MLP output value for each event.Finally, this output value is processed with a sigmoid function to produce an IN output score (S ML ), which predicts whether an event contains an LLP decay or not.and R s ) to form a graph that captures interactions between tracks.This graph is subsequently processed by an MLP (ϕ R ) to compute the effect (E) of the interactions.The effect is then combined with R r and merged with the original input O.To assess the influence (P) of the effect on the original information, it undergoes further processing via another MLP (ϕ O ).Finally, the influence is passed through an MLP (ϕ output ) and a sigmoid function to produce the final output.

Training and testing
As a discriminant that predicts whether an event is signal or background, the S ML is designed to have a value of 0 corresponding to background events and a value of 1 corresponding to signal events.The training process aims to optimize the discriminating power of the S ML .The IN is trained by iteratively minimizing the loss function [63], which is composed of several parts: a binary cross entropy L bce [63], an L2 regularization L reg [64], and a distance correlation term L dcorr [65].The binary cross entropy L bce is a commonly used metric for binary discriminant tasks.It quantifies the discrepancy between the predicted outcomes and the desired outcomes.The L2 regularization term L reg is calculated as the sum of the squared magnitudes of the model parameters and serves to prevent overfitting by constraining these parameters.In background events, a distance correlation (DisCo) [66] method is used to decorrelate S ML from n track so that these two variables can be used in the background estimation method described in Section 7. The DisCo method adds L dcorr to the IN loss function, which is based on the degree of correlation between S ML and the n track value of the vertex with the largest number of tracks in background events.We note that fewer than 0.05% of background events contain more than one reconstructed displaced vertex.The complete IN loss function is then L = L bce + λ reg L reg + λ dcorr L dcorr , where λ reg and λ dcorr are hyperparameters that control the strength of each restriction.In each iteration, the IN processes all training sample events in batches of 128 events and updates the MLP parameters after processing each batch using the Adam algorithm [67] to minimize the loss function.Matrix elements of R r and R s are not updated during the training since they are just used to establish the relationships between tracks.
The events used for training, validation, and testing are required to have at least one reconstructed vertex with n track ≥ 3 and pass all event preselection requirements except the offline p miss Hyperparameter values are set to λ reg = 0.00005 and λ dcorr = 0.65, which ensures that the model is not overtrained and S ML is uncorrelated with n track .A learning rate of 0.0003 is chosen to avoid the case that the training does not converge because of a large learning rate.Models with different combinations of hyperparameters were tested, and the trained model that provides the best discriminating power while satisfying the decorrelation requirement between S ML and n track is used for all events in all data-taking periods.

Performance
The testing events are used to evaluate the performance of the IN.The S ML distributions in simulated background events and split-SUSY signal events with n track = 3, 4, and ≥5 are shown in Fig. 3.The significant differences between the S ML distributions of background and signal samples show the strong discriminating power of the IN.Tagging events with S ML > 0.2 as signal-like events correctly identifies approximately 97% of signal events while rejecting approximately 97% of background events.The S ML distributions of signal events with n track ≥ 5 have an increased likelihood around S ML of 0.4 that is caused by the decorrelation term of the IN loss function.During the training, background events with larger n track are more likely to obtain higher S ML without the DisCo regulation.To achieve the decorrelation, the IN tends to assign a smaller S ML value to events with larger n track .Given that events from signal samples with larger mass splittings and lifetimes are more likely to have well-reconstructed vertices with larger n track , they are also more likely to obtain a smaller S ML .
To estimate the effect of decorrelation between S ML and n track , we divide events into three regions depending on the S ML : 0 < S ML < 0.2, 0.2 < S ML < 0.6, and 0.6 < S ML < 1.0.The n track distributions in different regions are compared, as shown in Fig. 4. As seen in the distributions, background events have similar n track distributions in different S ML regions, which demonstrates the success of the decorrelation technique.

CMS Simulation
Figure 4: The distribution of n track in different S ML regions for simulated background events.Events with 0 < S ML < 0.2 (blue), 0.2 < S ML < 0.6 (red), and 0.6 < S ML < 1.0 (green) are compared.All distributions are normalized to unity.The similar n track distributions demonstrate that n track and S ML are decorrelated.

Event selection and signal efficiency
The signal region contains all events that pass the event preselection, have at least one displaced vertex with n track ≥ 5, and have S ML > 0.2.The overall signal selection efficiency, which is defined as the fraction of generated signal events that pass the signal region selection, is directly related to the track reconstruction efficiency, vertex reconstruction efficiency, and ML tagging efficiency.It is important to understand these efficiencies in data and simulation and account for any observed differences.The procedures are based on methods similar to those of Ref. [24].The simulated background processes described in Section 3 are used in the studies.
The track reconstruction efficiency is studied as a function of d BV in data and simulation using events enriched in K 0 S → π − π + decays.The K 0 S decays are selected by requiring events that satisfy the preselection and contain a pair of tracks that form a well-reconstructed and displaced vertex.The vertex is required to be located inside the beam pipe, and the invariant mass of the two tracks associated with the vertex is required to be between 0.490 and 0.505 GeV when assuming the charged-π mass hypothesis.The d BV distribution of resulting K 0 S vertices is compared between data and simulation to determine potential corrections to the signal simulation.The d BV distribution in simulation is normalized such that the number of K 0 S vertices is consistent with data in the 0.5-0.8mm region, where tracking efficiency is known to be well modeled and the number of K 0 S vertices results in a sufficiently small statistical uncertainty in the normalization.The number of reconstructed K 0 S vertices in data and simulation as a function of d BV is shown in Fig. 5. Given that each K 0 S vertex is composed of two tracks and failing to reconstruct one track is the dominant cause of the failure to reconstruct a K 0 S vertex, the track reconstruction efficiency is proportional to the square root of the K 0 S vertex reconstruction efficiency.The ratio of data-to-simulation track reconstruction efficiency is found to be between 98 and 100% in all data-taking periods and bins of d BV .These ratios are used to derive systematic uncertainties in the signal efficiency using the procedure described in Section 8.1.tracks in data and simulated background events to produce displaced vertices that mimic displaced vertices generated by LLP decays in signal events.Events are required to satisfy the preselection, which results in a sample dominated by background processes because of the lack of track displacement and vertex reconstruction requirements.All simulated background processes described in Section 3 are used in the study.From the selected events, jets with p T > 20 GeV and at least three matched PF candidate tracks are considered for further study.These jets are categorized as b or light jets based on whether they pass or fail the "tight" working point of the DEEPJET algorithm [68], which has a b tagging efficiency of 55-65% in QCD events with p T > 30 GeV and a misidentification rate for light jets of 0.1%.To mimic the topology of signal events, two light jets are randomly selected and tracks associated with the selected jets are displaced simultaneously from the primary vertex.The direction of the displacement is determined by the vector sum of the selected jet momenta, which is then subjected to a Gaussian smear with a standard deviation of 2.0 rad on both ϕ and the polar angle θ, while the magnitude of the displacement is randomly sampled from an exponential distribution with cτ of 10 mm, which reproduces the gluino decays in the simulated signal events.Corrections to the angular separation and the number of artificially displaced tracks per jet, derived from the ratios of those distributions between signal and events with artificial track displacements, are applied to ensure they describe signal vertices well.

The vertex reconstruction and ML tagging efficiencies are studied by artificially displacing
The vertex reconstruction and ML algorithms are then applied to the events with artificially displaced tracks.The vertex reconstruction efficiency is calculated as the fraction of all artificially displaced vertices that are reconstructed within 200 µm of the expected position and satisfy the vertex selections described in Section 4. The ML tagging efficiency is calculated as the fraction of events with a reconstructed vertex that have S ML larger than 0.2.The efficiencies are measured for both data and simulated events, and are compared in Fig. 6.The vertex reconstruction efficiency ratios between data and simulation range from 80 to 97%, depending on different mass splittings, while the ML tagging efficiency ratios range from 91 to 100% for different mass splittings.In contrast, the data-to-simulation efficiency ratios differ by less than a few percent between samples with different gluino masses and cτ values.Therefore, the efficiency ratio value used to calculate correction factors and systematic uncertainties is only dependent on the mass splitting and data-taking period.
The simulated signal event yield is corrected by scaling the vertex reconstruction efficiency ratio to reproduce the expected behavior in data.The differences between unity and the vertex efficiency and ML tagging efficiency ratios are used as systematic uncertainties following the procedures described in Section 8.

CMS
Figure 6: The vertex reconstruction efficiency (left) and ML tagging efficiency (right) for artificially displaced vertices in data (black) and simulation (red).In this example, the artificially displaced vertices are corrected to mimic split-SUSY signal events with gluino mass of 2000 GeV and neutralino mass of 1800 GeV.The uncertainties are too small to be visible in the plot.

Background estimation
The background events that pass the signal region selection are expected to be dominated by events in which displaced vertices are formed by random crossings of otherwise unrelated tracks.We estimate the number of such events and validate the estimation procedure using regions in data that are chosen to be orthogonal to the signal region and dominated by similar events.The control and validation regions are constructed from all events that pass the event preselection and have at least one reconstructed displaced vertex with n track ≥ 3 but do not pass the signal region selections (n track ≥ 5 and S ML > 0.2).Compared with background events in the signal region, background vertices with lower numbers of tracks are similar because they result from the same effect, i.e., unrelated tracks randomly crossing each other.However, the possibility to form a high track multiplicity background vertex is lower because it is less likely to have more tracks crossing the same spatial point by coincidence.The events passing these selections are then separated into three bins of n track (n track = 3, 4, or ≥5), and each n track bin is further divided into two bins depending on whether S ML > 0.2 is satisfied.Figure 7 illustrates how the control regions (labeled B, D, E, and F), validation region (C), and signal region (A) are arranged in the plane defined by n track and S ML .All control and validation regions are dominated by background events from tt and QCD processes.
The background estimation method takes advantage of the statistical independence of S ML and n track , which is ensured by the DisCo method.Because the fraction of background events that satisfy S ML is independent of the value of n track , the ratio of events in region E to events in region F should be consistent with the equivalent ratios of regions C to D and regions A to B, within the statistical uncertainty.The number of background events in the validation and signal regions can therefore be estimated as To validate the background estimation procedure that is used to estimate the number of events in region A using Eq. ( 2), we predict the number of events in region C using Eq. ( 3) and compare this prediction to the observed number of events in region C.The prediction and observation agree within statistical uncertainties.

Systematic uncertainties affecting signal
Systematic uncertainties associated with signal reconstruction are due to uncertainties in the modeling of the track reconstruction efficiency, vertex reconstruction efficiency, ML tagging efficiency, p miss T , choice of PDF, trigger efficiency, pileup, integrated luminosity, and running conditions.The dominant source of systematic uncertainty is associated with the choice of PDF, followed by the modeling of vertex reconstruction efficiency.
The track reconstruction efficiency impacts the signal efficiency because n track ≥ 5 is one of the signal region selections.In cases where an LLP decays to exactly five charged particles, failure to reconstruct even one of the five tracks will cause the vertex to fall outside of the signal region, and the impact of track reconstruction on such vertices is calculated as the fifth power of the track reconstruction efficiency.On the other hand, for vertices with more than five tracks, the failure to reconstruct any of the tracks will not necessarily cause the vertex to be excluded from the signal region, as long as there are at least five tracks remaining.Since the track reconstruction efficiency in data and simulation agree within 2%, the impact of track reconstruction efficiency is negligible for those vertices.We apply a systematic uncertainty equal to the differences in the fifth power of the track reconstruction efficiency observed in the study described in Section 6.The assigned uncertainties range from 6 to 21%, depending on signal mass, signal lifetime, and data-taking period.
In addition, a systematic uncertainty associated with the correction to the vertex reconstruction efficiency, described in Section 6, is assigned to account for the potential remaining differences between data and simulation, discrepancies between artificially displaced events and simulated signal events, differences between different gluino masses and proper decay lengths, and other small effects.The systematic uncertainties are taken to be the differences in vertex reconstruction efficiency between data and simulation observed in the study described in Section 6, with values ranging from 3 to 20%.
Similarly, a systematic uncertainty associated with ML tagging efficiency is assigned to account for any differences in the behavior of the ML algorithm between data and simulation.The magnitude of this uncertainty obtained as the difference of ML tagging efficiency between data and simulation in the study described in Section 6, ranges from negligibly small to 24%.
Potential mismodeling of jet and unclustered energy scales and jet energy corrections impact the p miss T value, and therefore affect the p miss T selection efficiency.To account for this effect, we assign a systematic uncertainty of up to 8%, derived from the change in signal efficiency when varying the parameters associated with the mismodeling by 1 standard deviation about their mean.The variations of the PDF could result in changes in the selection efficiency in the signal region.The associated uncertainty is evaluated by reweighting simulated signal events using the variation observed between NNPDF replica sets [69].For most of the signal samples, the resulting PDF uncertainty is within 20%.However, for signal samples with very low signal efficiency, as the ones with mass splittings below 50 GeV and cτ values of 0.1 or 1000 mm, the PDF uncertainty is affected by statistical fluctuations and reaches values as large as 85%.The trigger efficiency systematic uncertainty is determined by its statistical uncertainty and the variation when using different data sets, resulting in an uncertainty of 1-6%.The uncertainty associated with the modeling of the pileup distribution is 2-15%.The integrated luminosity measurements for the 2016, 2017, and 2018 data-taking years have 1-3% individual uncertainties [70][71][72], while the overall uncertainty for the 2016-2018 period is 1.6%.During 2016 and 2017 data-taking periods, a gradual timing shift in the ECAL was not correctly accounted for in the L1 trigger, resulting in an efficiency drop for events with significant ECAL energy distributed in the high η region.This effect results in an uncertainty in the signal yield up to 1%.
A summary of systematic uncertainties that affect the signal yield is presented in Table 1.The overall systematic uncertainty is calculated by summing individual components in quadrature under the assumption that there are no correlations between them.

Systematic uncertainties in background estimation
Since the background is estimated from data, potential mismodeling of the simulation does not have an impact.The number of events in the signal region is predicted using the number of events in control regions B, E, and F, which are based on 3-track and 5-track control regions.Given that the two variables that define the search regions are statistically independent, the prediction of the number of background events in the signal region based on Eq. ( 2) should be the same using 3-track control regions (E and F) or 4-track control regions (C and D).To validate this assumption, we compare two predictions of the number of background events in the signal region, one based on 3-track control regions and the other based on 4-track control regions.We find that the two predictions are compatible within the statistical uncertainty and therefore do not assign an additional systematic uncertainty in the background estimation.

Results and statistical interpretation
A maximum likelihood fit under the background-only hypothesis is performed on all search regions.Systematic uncertainties described in Section 8 are taken to be uncorrelated and used as nuisance parameters in the fit.The predicted and observed numbers of events in all search regions after the fit are reported in Table 2.In the signal region, nine events are observed while 5.2 ± 0.5 events are predicted, which corresponds to a p-value of 0.089.No significant discrepancy between the background-only fit prediction and the observation is seen.Based on the observed number of events and the background prediction, constraints are set on parameters of the benchmark signal samples used in the search.Specifically, the 95% confidence level (CL) upper limit on the product of the production cross section and the square of the branching fraction (σB 2 ) of the decay mode is estimated using the CL s method [73][74][75].The signal yields are determined from simulated signal events, with all the corrections applied; the number of background events is determined purely from data.Data sets from different datataking periods are treated separately and combined during the final fit.The upper limits are compared with the theoretical prediction of the production cross section and uncertainty [76] calculated at NNLO+NNLL precision [47] to exclude regions of the parameter space of the benchmark signal models.The branching fraction for the gluino decay to the targeted final state in split SUSY and GMSB SUSY is assumed to be 100% for all signal samples.
Figure 8 shows the upper limits and exclusion curves for the split-SUSY model as a function of gluino mass, gluino cτ, and gluino-neutralino mass splitting.The search is most sensitive when gluino cτ is approximately 10 mm and the gluino-neutralino mass splitting is large, but the sensitivity extends to gluino cτ values from 0.1 to 1000 mm and mass splittings as low as 20 GeV.The decreasing sensitivity at the low cτ values is driven by the decreasing displaced vertex reconstruction efficiency, while the decreasing sensitivity at highest cτ values is driven by the constraint that the displaced vertex must be within the CMS beam pipe.Figure 9 shows the upper limits and exclusion curves for the GMSB SUSY model as a function of gluino mass and cτ, which exhibit a similar dependence on gluino lifetime to those associated with the split-SUSY model.Figure 8: Left: the 95% CL s upper limit on the product of the cross section and branching fraction squared for the split-SUSY signal model with a mass splitting of 100 GeV, shown as a function of gluino mass and cτ.Right: the 95% CL s upper limit on the product of the cross section and branching fraction squared for the split-SUSY model with a cτ of 10 mm, shown as a function of gluino mass and mass splitting.For both plots, the observed (solid black) and expected (dashed red) exclusion curves are shown.
The observed upper limits are weaker than the expected by about 1.35 standard deviations since more events are observed in the signal region compared to the expectation.As this search uses only one signal region, the upper limits for different signal parameters are calculated using the same region, which makes the difference between observed and expected upper limits similar for different signal parameters.For the split-SUSY benchmark signal model, the search excludes gluinos with cτ in the range 1-100 mm and masses below 1800 GeV for mass splittings of 100 GeV with respect to the neutralino.Also, for mass splittings above 50 GeV, the search excludes gluinos with masses below 1600 GeV and cτ between 1-30 mm.For gluino masses of 1400 GeV and neutralino masses of 1300 GeV, this search excludes production cross sections that are a factor of 2-100 times smaller than those excluded by Ref. [22] and 10-100 times smaller than those excluded by Ref. [29], depending on the gluino lifetime.For the GMSB SUSY signal model, the search achieves upper limits on the product of the cross section and branching fraction squared as low as 1 fb for gluinos with cτ ranging from 0.1 to 1000 mm and excludes gluinos with cτ in the range 0.3-100 mm and masses below 2240 GeV.

Summary
A search for the production of long-lived particles that decay to at least one displaced vertex with missing transverse momentum in proton-proton collisions at a center-of-mass energy Figure 9: The 95% CL s upper limit on the product of the cross section and branching fraction squared for the GMSB SUSY signal model, shown as a function of gluino mass and cτ.The observed (solid black) and expected (dashed red) exclusion curves are shown. of 13 TeV collected by the CMS detector has been presented.The analysis extends the previous CMS search [24] by improving the sensitivity to events with low total jet energy, targeting events with as few as one displaced vertex, and introducing a dedicated machine learning algorithm that reduces the number of background events in the signal region by 94%.Split supersymmetry (SUSY) and gauge-mediated SUSY breaking are used as benchmark signal models for statistical interpretations in this search.
At 95% confidence level, the search excludes long-lived gluinos predicted by the split-SUSY model with masses below 1800 GeV and mean proper decay lengths in the range of 1 to 100 mm, when the mass splitting is 100 GeV.For mass splittings above 50 GeV, gluinos with masses below 1600 GeV and mean proper decay lengths between 1 and 30 mm are excluded.For the gauge-mediated SUSY breaking model, gluinos with masses below 2200 GeV and mean proper decay lengths between 0.3 and 100 mm are excluded.This search is the first CMS search that shows sensitivity to hadronically decaying long-lived particles from signals with mass differences between the gluino and neutralino below 100 GeV.It sets the most stringent limits to date for split-SUSY models and for GMSB gluinos with proper decay length less than 6 mm.

Figure 1 :
Figure 1: Diagrams of the split-SUSY model (left) and GMSB SUSY model (right).In the split-SUSY model, a pair of long-lived gluinos is produced, and each decays to two quarks and one neutralino.In the GMSB SUSY model, a pair of long-lived gluinos is produced, and each decays to a gluon and a gravitino.

Figure 2 :
Figure 2: An illustration of the architecture of the IN, where the flow of data is indicated by arrows.Rectangular boxes represent data matrices, while diamonds represent multilayer perceptrons (MLPs).The original input information (O) is integrated with relation matrices (R r and R s ) to form a graph that captures interactions between tracks.This graph is subsequently processed by an MLP (ϕ R ) to compute the effect (E) of the interactions.The effect is then combined with R r and merged with the original input O.To assess the influence (P) of the effect on the original information, it undergoes further processing via another MLP (ϕ O ).Finally, the influence is passed through an MLP (ϕ output ) and a sigmoid function to produce the final output.

Tselection.
Training and validation events are required to have 80 < p miss T < 200 GeV to ensure orthogonality with testing events, which have p miss T > 200 GeV.Within the training and validation events, 85% of them are used for training, and 15% are used for validation.To avoid any bias introduced by potential mismodeling of simulated events, 17 067 data events and 31 165 simulated background events are mixed together and labeled as background events during the training of the IN.The simulated background events are drawn from different simulated SM processes in proportion to their cross section.To avoid potential bias, data events that include vertices with n track ≥ 5 are excluded from the training and validation samples.Split-SUSY signal samples with different lifetimes and masses are combined and labeled as signal events in the training, and events from different data-taking periods are combined during training.Specifically, 91 013 events are used for training and 17 065 events are used for validation.

Figure 3 :
Figure 3: Distributions of S ML for data, simulated background, and signal.Events with n track of 3 (upper), 4 (middle), and ≥5 (lower) are shown individually.The background simulation is shown for illustration and it is not used in the search.The distributions are shown for split-SUSY signals with a gluino mass of 2000 GeV and neutralino mass of 1900 GeV (left) and 1800 GeV (right).Different gluino proper decay lengths and mass difference between the gluino and neutralino, are shown as cτ and ∆m in the legend.All distributions are normalized to unity.

Figure 5 :
Figure 5: The distribution of d BV in K 0 S vertices in data (black) and simulation (purple).The lower panel shows the ratio between data and simulation.

Figure 7 :
Figure 7: A schematic diagram of the signal (red), validation (yellow), and control (gray) regions.The letter in each box corresponds to the region label described in the text.

Table 1 :
Summary of systematic uncertainties that affect the signal yield.The magnitude of each systematic varies by data-taking period and signal parameters, so a range of values is given in each case.

Table 2 :
Number of predicted and observed events in the control, validation, and search regions.Predictions are calculated using Eqs.(2) and (3) and fitting the data under the background-only hypothesis.Regions are organized by S ML and n track values, and region names corresponding to Fig.7are given in parentheses.The predicted number of events that pass the S ML selection and the observed number of events that pass or fail the S ML selection are shown in separate rows.