Deep learning for the R-parity violating supersymmetry searches at the LHC

Supersymmetry with hadronic R-parity violation in which the lightest neutralino decays into three quarks is still weakly constrained. This work aims to further improve the current search for this scenario by the boosted decision tree method with additional information from jet substructure. In particular, we find a deep neural network turns out to perform well in characterizing the neutralino jet substructure. We first construct a Convolutional Neutral Network (CNN) which is capable of tagging the neutralino jet in any signal process by using the idea of jet image. When applied to pure jet samples, such a CNN outperforms the N-subjettiness variable by a factor of a few in tagging efficiency. Moreover, we find the method, which combines the CNN output and jet invariant mass, can perform better and is applicable to a wider range of neutralino mass than the CNN alone. Finally, the ATLAS search for the signal of gluino pair production with subsequent decay $\tilde{g} \to q q \tilde{\chi}^0_1 (\to q q q)$ is recasted as an application. In contrast to the pure sample, the heavy contamination among jets in this complex final state renders the discriminating powers of the CNN and N-subjettiness similar. By analyzing the jets substructure in events which pass the ATLAS cuts with our CNN method, the exclusion limit on gluino mass can be pushed up by $\sim200$ GeV for neutralino mass $\sim 100$ GeV.


I. INTRODUCTION
As one of the most promising new physics beyond the Standard Model (SM), supersymmetry (SUSY) [1,2] has been copiously searched at the LHC [3,4]. With the Z 2 R-parity [5], the Lightest Supersymmetric Particle (LSP) can be a weakly-interacting-massive-particle dark matter (DM) candidate with the right amount of the DM relic density [6]. Moreover, the R-parity conserving (RPC) SUSY at hadron collider can be discovered by looking for the particles with high transverse momenta and large missing energies in the final state. The gluino/squark masses have been excluded up to to a couple of TeV scale [7,8] at the current stage of the LHC.
However, the R-parity is not mandatory in SUSY models. In contrast to the RPC scenario where the bounds on the colored sparticle have been pushed up to the region without any SM backgrounds, some the R-parity violating (RPV) scenarios are still weakly constrained. Thus, some improvements on the RPV searches are desired and possible. Especially, the bounds on the RPV operators U c D c D c , where U c and D c denote the right-handed up-type and down-type quark superfields respectively, are quite weak due to the large hadronic activities expected at the LHC [9][10][11][12][13][14]. In our recent work [15], the status of LHC reaches on stop and sbottom masses with this kind of U c D c D c operators are studied. We found the stop and sbottom with mass ∼ 500 GeV are still not fully excluded. One of the important reasons is that the RPV bounds at current stage are usually studied in the simplified model framework, not all of the information of our specific signal is utilised.
In hadronic RPV case, the decay products of boosted heavy sparticle will be collimated, forming an single fat jet at the detector. The information from the fat jet substructure (see Refs. [16][17][18][19][20][21] for reviews) was found to be useful in improving the search sensitivities, e.g. neutralino jet substructure [22] or top squark jet substructure [23,24]. To characterize the jet substructure, traditionally, some high-level kinematic variables such as mass-drop [25] and N-subjettiness [26] are defined on the jet. On the other hand, all information of a jet can be inferred from the electromagnetic and hadronic calorimeters, with the basic observables being the position in the η − φ plane and energy deposit of each calorimeter cell. By identifying each cell as a pixel, and the energy deposit in the cell as the intensity (or grayscale color) of that pixel, the jet can be naturally viewed as a digital image. The recent developments of computer vision can be supplied as helpful tools for us to tag the jet nature with low-level inputs. There are a number of works that use the jet image to discriminate hadronic W/Z jet [27][28][29][30] and top quark jet [31][32][33] from QCD jet, and discriminate quark jet from gluon jet [34,35]. These studies show that the jet taggers based on computer vision perform comparably or even slightly better than those based on the high-level kinematic variables. Some improved algorithms have been proposed in Refs. [36][37][38]. It has been realized recently that the idea of jet image suffers from the disadvantage of low efficiency attributed to sparseness. Machine learning techniques other than image recognition have been considered, such as using Recursive Neural Networks [39,40], taking ordered sequence of jet constituents as inputs [41], and working on Lorentz vectors of jet constituents [42].
In this work, we will try to improve a realistic RPV SUSY search at the LHC by using the deep learning technique, i.e. Convolutional Neutral Network (CNN) that tags the sparticle jet. The signal process under consideration is the gluino pair production, which decays into two quarks and a neutralino. The neutralino will subsequently decay into three quarks through the hadronic RPV operators U c D c D c . The main task of CNN is to discriminate the boosted neutralino jet in this signal process from the QCD jet in SM background processes.
Firstly, there is no prototype in SM that producing the same three prong structure as neutralino jet. Also, the mass of neutralino is a unknown parameter. We will show the changes of the CNN tagging efficiency when the neutralino mass is different from the one that CNN is trained on. Our CNN is firstly trained on a model independent sample with only single neutralino jet in the final state. And then it will be applied to each jet in both the full signal and background events that pass all selections in the ATLAS searches. According to the scores of these jets that the CNN assigns to, the signal and background can be separated further, leading to better search sensitivity.
The paper is organized as follows. In Sec. II, we briefly review the concept of CNN, some technologies will be introduced. In Sec. III, the architecture of CNN that adopted in this paper will be given. Sec. IV discusses the model independent training process of the CNN. Its application to a realistic RPV gluino search is studied in Sec. V. Our conclusion is in Sec. VI.

II. BRIEF REVIEW OF CONVOLUTIONAL NEUTRAL NETWORK
The CNN is a powerful tool in image recognition, and is a specialized neural network for processing data that has grid-like topology, such as picture and time-series data. It has been widely used in computer vision.
The CNN mainly has three kinds of layers: convolutional layers, pooling layers, and fully connected layers. The convolution, in mathematical sense, can be expressed by In convolutional layer, the x(a) is referred as the input, w(t − a) corresponds to the convolutional kernel, and the output s(t) is called feature map. One can apply many convolutional actions to the input data in parallel, and each convolutional kernel is used to extract one special feature of an image. In fact, the input data is not just a grid of real numbers, rather, it is a tensor of many layers with grid value. For example, in our work, the input includes three layers: all particles's energy distribution matrices, charged particles's energy distribution matrices, and the number of charged particles's matrices. These give a 3-D tensor as our input 1 . Then, the convolution between input V and kernel K can be written as where l is the layer index, m, n, j, and k are the pixel indices of each layer, and i is the index for the number of convolutional kernels. The convolution can also be done with an stride s Z i,j,k = l,m,n V l,(j−1)×s+m,(k−1)×s+n K l,m,n .
Once the convolutional result is obtained, we should pass Z i,j,k to the activation function. The idea of an activation function is inspired by biological neurons that fire if a certain threshold is reached. Modern application of the CNN use the rectified linear unit (ReLU) as the activation function because of its computationally fast ReLU(x) = max{0, x}.
Another typical feature of the CNN is the pooling layer. Comparing to the convolution, a pooling function replaces the network at a certain location with a summary statistic of the nearby pixels. It is a down-samplings process. The most commonly used pooling function is the max-pooling, which returns the maximum pixel in a rectangular neighbourhood. The important application of pooling layer is that it not only helps the CNN relatively free from noises but also makes the CNN invariant from small transformations, rotations and scalings. After the pooling layer, the dropout step is proceeded to prevent over-fitting. The dropout means a random subset of units are ignored during the training process.
Next, the feature maps from the last pooling layer will be read by a fully connected neural network (FCNN), which is typically constituted of several layers: input layer, hidden layer and output layer. The first step is to flatten the feature map, which transforms each feature map of size n × n into n 2 × 1. If there are k feature maps, the input layer of the FCNN will require kn 2 neural nodes to read all inputs. The output of each layers in the FCNN is the weighted sum of all inputs, which is then feed to the ReLU activation function, where W ij and b j are called weight and bias. In the last layer, the FCNN gives the probability of each class. For the classification problem with c classes, the output probability is given as Finally, the training needs to be applied to optimized the weights and biases of the all nodes in the CNN. This can be done by minimizing the loss function where θ corresponds to the model parameters (weights and biases), x i is the input, y(θ; x i ) and y i are the predict result and target, respectively. The improved stochastic gradient descent with back-propagation algorithm (NAdam algorithm) can be used to calculate the minimum of the loss function.

III. THE CNN ARCHITECTURE
There are many existing CNN architectures, which have been proved to be very successful on PASCAL Visual Object Classes and Caltech image classification, such as the VGG16 and resnet50. Images used in those analyses are typically large, while in our case, due to the limited angular resolution of the detectors at hadron collider, the jet image is usually smaller than 30 × 30 pixels. Those architectures are found to be not very efficient for jet classifications. On the other hand, the CIFAR-10 architecture is optimized on small figure classification, with the images size of 32 × 32 and it can manage 10 classes. This is quite similar to our situation. Inspired by the CIFAR-10 architecture, the sketch of our CNN architecture is shown in Fig. 1.
The input constitutes of three layers defined as the energy distribution of all particles, the energy distribution of charged particles and the number of charged particles in calorimeter cells. The more detailed jet image preprocessing will be introduced later. Those data are then passed through two iterations of two convolutional layers with ReLU activation and a max-pooling layer. The size and the total number of the convolution kernel (also called filter) in each convolutional layer are free parameters. In practice, we need trial and error to figure out the best choice. In the figure, at the first step of iteration, the input is convoluted twice by 64 filters with same size of 6 × 6, followed by max-pooling with filter of size 2 × 2 and with stride of unit. While at the second step of iteration, the size of convolutional filters are reduced to 3 × 3. The total number of filters in each convolutional layer and the filter in pooling layer remain the same as the first iteration. The feature map is flattened and read by the FCNN. There are 512 neural nodes defined in the hidden layer of the FCNN where the ReLU activation function has been adopted. The final output contains two nodes with sigmoid activation function. The values of each node can be normalized between [0,1] so that it can characterize the probability of either signal or background.

IV. MODEL INDEPENDENT TRAINING AND TESTING
Our goal is to employ the CNN that can recognize the jet image of neutralino from jet images of quark and gluon, so that the signal processes with neutralino in the final state can be separated better from backgrounds. To make our CNN as a general neutralino jet recognizer which does not specific to any detailed production processes, the training of CNN is based on the signal event samples with only one visible neutralino in the final state which subsequently decay into three quarks. Throughout the work, the hard-scattering signal and background events as well as the neutralino decay are simulated by the MadGraph5_aMC@NLO program [43]. Pythia8 package [44] is used to perform the parton shower and hadronization. The detector effects are simulated by the Delphes3 [45] with ATLAS configuration card, in which the jet reclustering algorithm is implemented via the FastJet [46] software. Our CNN is implemented in Python, using the deep learning library Keras [47].
The training and testing samples are generated and proceeded as follows. Firstly, the signal events with single visible neutralino jet are generated by pp →χ 0 1χ 0 2 process in SUSY model, withχ 0 2 → bcs through the U c 2 D c 2 D c 3 operator [15]. Theχ 0 1 is assumed to be stable here which leaves nothing inside the detector 2 . As a benchmark, we choose the mass ofχ 0 2 to be 100 GeV. Its transverse momentum is required to be p T (χ 0 2 ) > 100 so its decay products are collimated and behave as a jet at detector. Furthermore, it is obvious that the neutralino jet image will be varying if the polar angle (or pseudorapidity) of the neutralino is changed. To consider this effect, two classes of signal events sample are generated: one with requirement of |η(χ 0 2 )| < 0.1 (central sample) and the other allows much larger pseudorapidity |η(χ 0 2 )| < 2.5 (wide sample). Secondly, the background events in training and testing are generated by pp → jχ 0 1χ 0 1 in SUSY model, where j can be either quark or gluon andχ 0 1 is stable at detector. As in signal event generation, the transverse momentum of j is required to be p T (j) > 200 GeV and two classes of background samples with cuts of |η(χ 0 2 )| < 0.1 and |η(χ 0 2 )| < 2.5 are defined. It should be noted that the initial state radiation and multiparticle interaction have been turned off in Pythia8 for both signal and background event generation, in order to suppress their contaminations to the jet image. Thirdly, in both signal and background events, jets are reconstructed by the anti-k t algorithm [48] with cone size R = 1.0. The minimal transverse momentum of target jet should be 100 GeV. A event will be dropped if there is no jet with p T (j) > 100 GeV. In case of more than one jet with p T (j) > 100 GeV in an event, the jet with highest p T is chosen. For signal events, we also require the selected jet lie within a cone size of R < 1.0 to the parton levelχ 0 2 . At this stage, each event has been associated by an single jet, which is expected to be neutralino jet (QCD jet) for signal (background) event. Next, we need to convert the jet information into grid image. Given a jet, its hardest constituent is located on the η − φ plane. Afterwards, a grid with step of 0.1 × 0.1 and size of 30 × 30 that is centralized at the hardest constituent is defined. Based on the grid and the jet constituent information, we can define three different layers for jet image: (1) layer show the energy grid of all jet constituents, where the energies of jet constituents that belong to the same cell are added up; (2) as in the first layer, but only the energy of charged jet constituents are taken into account; (3) the layer that counts the number of charged jet constituents in each cell. Since the CNN is found to be most efficient in dealing with numbers between [0,1], all numbers in each layers are divided by the maximal value in that layer, e.g. maximal energy of the cell in the first layer. We will not apply any more image precessing procedures, such as rotation and flipping, because they were found to decrease the performance of our CNN.
Finally, to use our data set in a more efficiently way (we have generated one million signal and background events for training), 30 epochs are required during the training process. And to avoid the over-training problem, an independent one million signal and background events are used for testing.
As have been explained in Sec. II, there are a number of free parameters in the CNN that can only be optimized through trial and error, including the sizes and numbers of convolutional kernels in the convolutional layers, the dropout rates after two iterations and FCNN, the number of nodes in the hidden layer of the FCNN and the learning rate in NAdam algorithm. We find the performance of the CNN only mildly depends on those parameters. In the left panel of Fig. 2, the performance of the CNNs with the number of convolutional filter in convolutional layers being 8, 16, 32 and 64 are shown (same number is adopted in all convolutional layers). The CNN with convolutional kernels more than 16 performs equally well, they are slightly better than the one with 8 convolutional kernels. To obtain those results, we have taken the size of the convolutional kernel to be 6 × 6 3 , the dropout rate in two iterations as 0.25 while it is 0.5 for the FCNN. The number of nodes in the hidden later is 512 and the learning rate is taken to be 0.001. This parameter choice will be used throughout this work. Note that we have defined two sets of the CNN that are trained and tested on central sample and wide sample of signal and background events, respectively. The results presented in the left panel correspond to the wide sample trained the CNN that is applied to another independent wide sample. In the right panel, to characterize the dependence of the jet image feature on the jet pseudorapidity, we show the performance of those two sets of the CNNs (both with 64 filters in all convolutional layers) on different samples. There is no doubt that the central jet (|η(j)| < 0.1) is easier to tag than the jet within wide pseudorapidity range (|η| < 2.5). The CNN trained and tested on the central sample is not working for tagging neutralino jet in the wide sample, mainly because features captured by the CNN in central sample are not useful for wide sample. On the other hand, the CNN trained and tested on the wide sample performs well in tagging neutralino jet in the central sample, even though it is slightly worse than the CNN that is trained and tested directly on the central sample. This means we do not have to limit our analysis to the phase space with target jet in the central region. It is especially useful in a realistic signal search at the LHC, so that more signal events can be saved. In the following, we will keep use the CNN that is trained and tested on the wide sample with filter number in each convolutional layer being 64. We should compare the performance of our CNN with those high-level jet substructure variables. Among these, the N-subjettiness is a general and effective discriminating variable to characterize the multi-prong structure of a jet. It is defined as where k runs over all constituent particles in a given jet, p T,k are their transverse momenta, R J,k is the distance between a candidate subjet J and a constituent particle k in the η − φ plane, and R 0 is the characteristic jet radius used in the original jet clustering algorithm. A jet with N-prong will have τ N ∼ 0 when all of its constituents are aligned with candidate subjets while τ I 0 for I < N because there are constituents distributed away from the candidate subjet directions. As a result, the variable τ N /τ N −1 is found to be efficient in tagging jet with N-prong structure. In our case, the neutralino jet substructure can be tagged by τ 3 /τ 2 . The performance of the N-subjettiness technique is shown by the red solid line in Fig. 3. We find that the performance of our CNN (represented by blue dots) is a few times better than the N-subjettiness. Moreover, the jet invariant mass a powerful discriminate variable that is independent of N-subjettiness. To combine the discriminate power of both variables, the boosted decision tree (BDT) method [49,50] is adopted. The performance of the combination of N-subjettiness and jet invariant mass is given by the blue solid line, which shows the similar tagging efficiency as the CNN along.
Meanwhile, it is worth to find out whether our CNN is clever enough to learn both the Nprong structure and the jet invariant mass. This can be seen through the tagging efficiencies of their combinations. In Fig. 3, the performances of the CNN + N-subjettiness (SJ) and the CNN + jet invariant mass (M) are shown by cyan and green solid lines, respectively. As before, the combination of their sensitivities are managed by the BDT method. The CNN + SJ does not show much improvement than the CNN along. While the tagging efficiency can be improved by a factor of a few after jet invariant mass is included. Thus, we can conclude that the full information of prong structure in a jet can be learned by CNN but the jet invariant mass can not be directly extracted from the jet image by current method. One of the reasons is that the image preprocessing procedures are not respecting the Lorentz symmetry, so the jet invariant mass is broken down in the preprocessing. In the above studies, the neutralino mass has taken to be 100 GeV in all event samples. In practice, especially our method is trying to improve the signal discovery, the neutralino mass is a unknown parameter. It will be unrealistic to have the CNNs at exactly the same neutralino mass of signals that we want to probe. The best one can do in discovery is to train the CNN on a number of benchmark neutralino masses and apply the CNN to a relatively wide range of its neighborhood. Application ability of the CNN, which is trained on the fixed neutralino mass to signals with different neutralino masses, can be seen in Fig. 4. In the left panel, we show the performance of the CNN which is trained on sample with mχ0 2 = 100 GeV and applied to sample with neutralino masses in the range of [70,150] GeV. We can find that neutralino mass varies in the range of [90,125] GeV does not reduce the sensitivity much and the CNN is more vulnerable to the lower neutralino mass. On the other hand, the CNN can be more useful if it is used in combination with jet invariant mass. In the right panel, the performances of the combinational CNN + M on different neutralino masses are shown. The information from jet invariant mass helps improving the tagging efficiency a lot, especially in the light neutralino mass region, compensating the weakness of the CNN. The efficiency of the CNN + M method only mildly depends on the neutralino mass. To conclude, including jet invariant mass can not only improve the tagging efficiency but also extend the application of our CNN. They should be used together in realistic signal searches.

V. APPLICATION TO THE LHC GLUINO SEARCH
Having shown the power and generality of our CNN method, we are ready to show an explicit application in a RPV gluino search. The signal process is the gluino pair production, in which each gluino decays into two quarks and a neutralino. The neutralino decays though the hadronic RPV operator into three quarks. This signal has been searched by the ATLAS collaboration in Ref. [51]. For neutralino with mass ∼ 100 GeV, gluino lighter than ∼ 1.1 TeV has been excluded. The dominate background process in the search is the QCD multijet background. In this section, we will show how the CNN will help to improve the ATLAS gluino search. Before that, we need to recast the experimental analyses on both signal and background.
Even though it is difficult to exactly simulate the QCD multi-jet process, it is enough to use the MadGraph5 framework to generate its events at leading order for our purpose. In order to save the computer resource, according to the cuts that were adopted in the ATLAS analysis, we only consider processes with 4/5 jets at parton level, each jet should have p T > 200 GeV and |η| < 2.0. And the matching of those processes are handled by MLM method in the MadGraph5. Events with higher jet multiplicity are given by the initial state radiation and final state radiation, which are performed in Pythia8. The signal events are generated at the leading order as well, based on the benchmark points that have neutralino mass of 100 GeV and gluino mass varying in the range of [1,2] TeV with step size 100 GeV.
We recast the ATLAS analysis [51] as follows.
(1) For each event, large-R jets are reconstructed by anti-k T algorithm with radius parameter R = 1.0. A "trimming" process [52] with subjet radius parameter of R subjet = 0.2 and minimal transverse momentum fraction of 5% is applied on each large-R jet. The resulting trimmed large-R jets are required to have p T > 200 GeV and |η| < 2.0. The analysis only selects the events with at least three trimmed large-R jets (N jet ≥ 3) in which the leading one should have p T > 440 GeV. (2) Meanwhile, the small-R jets of each events are reconstructed by anti-k T algorithm with radius parameter R = 0.4. They are required to have p T > 50 GeV and |η| < 2.5. Those jets are used to count the number of b-tagged jets (N b ) in the final state. The b-tagging efficiency is taken to be 70% [53] with mis-tagging rates for the charm-and light-flavor jets of 0.15 and 0.008, respectively. (3) Two discriminant variables are defined for each event: the total jet mass variable (M Σ J ) which is the scalar sum of invariant masses of four leading trimmed large-R jets and the pseudorapidity difference between the two leading trimmed large-R jets (|∆η 12 |). (4) Four signal regions are defined as shown in Tab. I. In our analysis, we find the 4jSRb1 signal region always provides the most sensitive probe. So, our simulated signal and background events that can pass all of the selections of 4jSRb1 signal region are kept for later analysis. The cross section for signal at this stage can be calculated as the σ 13 (gg) × 4jSRb1 , where σ 13 (gg) is the gluino pair production cross section at the 13 TeV LHC which can be calculated at the next-to-leading-order by Prospino2 [54] and 4jSRb1 is the selection efficiency of the 4jSRb1 signal region that is obtained from our recasted analysis. The background cross section (σ BG ) at this stage is simply estimated by the numbers in the "SM predicted" column of Tab. I divided by the integrated luminosity of the analysis L = 14.8 fb −1 . Now, we can apply the CNN tag on the jets in the selected signal and background events. Firstly, in each of the selected events, jets are reconstructed in the same ways as we did in training our CNN, i.e. anti-k t with radius parameter R = 0.1 and transverse momentum p T > 100 GeV. Since two neutralino jets can be either energetic or relatively soft in signal process, all of the reconstructed jets are passed to our CNN for neutralino tagging. Each of them will be assigned by a signal possibility (There are two outputs of CNN: signal and background possibilities. The background possibility is one minus signal possibility). Then, jets are ranked by the signal possibility. The distributions of signal possibilities for the leading three jets are shown in Fig. 5. From which, we can see that jets in the signal events obtain larger possibility than those in background events. These information can help to separate the signal and background even further.
Because the CNN scores (signal possibility) for the leading three jets of signal and background are correlated to some extent. We employ again the BDT method to study the discriminating power of combination of these informations. For each signal with given gluino mass, applying a cut on the BDT response will reduce the signal and background cross section further down to σ 13 (gg) × 4jSRb1 × BDT S and σ BG × BDT B , respectively. The BDT S/B corresponds to the selection efficiency of a BDT cut on signal/background events. To calculate the p-value, we assume the observed event number as well as the background uncertainty is reduced by the same factor of BDT B . However, there is a lower limit on the background uncertainty L × σ BG × BDT B which corresponds to the unavoidable statistical uncertainty. The BDT cut that maximizes the p-value will be taken for each gluino mass. The p-value for the analysis with and without the CNN tagging are shown in Fig 6. Before the CNN tagging, our recasting shows that the benchmark points with gluino mass below ∼ 1.18 TeV can be excluded at 95% C.L, which coincide with the experimental results. The analysis with the CNN neutralino tagging will push the lower limit of the gluino mass up to ∼ 1. 35 TeV, which is about 15% improvement!

VI. CONCLUSION
We for the first time showed how the deep learning technique can be applied to a realistic collider search. Specifically, we considered the existing search by ATLAS Collaboration that aims to find/exclude the gluino of a RPV SUSY scenario in final state with multiple energetic jets.
The information of jet can be formatted into jet image by identifying each calorimeter cell as a pixel, and the energy deposit in the cell as the intensity (or grayscale color) of that pixel. The CNN is a constructed model independently that is capable to tag a neutralino jet by using the jet image. According to the small size and sparseness of jet image, the CIFAR-10 architecture of the CNN is adopted. It is able to tag the neutralino jet with efficiency of 50% while only accept ∼1% of QCD jet. These efficiencies were found to be insensitive to the CNN parameters in a wide range. Moreover, due to the cylinder shape of detector, jets image has strong dependence on the pseudorapidity of the jet. But the CNN performs well for jet either in the central region (|η| → 0) or with relatively large pseudorapidity (|η| 2.5). Our CNN can outperform the high-level jet substructure variable N-subjettiness by a factor of a few. However, the jet invariant mass information is not fully learned by the CNN, partly because the image preprocessing is not respecting the Lorentz symmetry. Combining the CNN output with the jet invariant mass can slightly improve the signal efficiency further. More importantly, our CNN is trained and tested on sample with given neutralino mass. The tagging method with combined CNN and jet mass performs much better than the method with the CNN along when applying to the neighborhood of that neutralino mass.
To study the realistic application of the CNN, the ATLAS analysis is recasted. Only the events (for both signal and background), which can pass all selection cuts of the 4jSRb1 signal region in the ATLAS analysis, are kept. The CNN will assign "neutralino jet possibilities" to all jets in these events. The jets in signal events are likely to obtain higher "neutralino jet possibilities" than those in background events. However, using this additional information, with neutralino mass of 100 GeV, we found the 95% C.L. exclusion bound on gluino mass can be pushed by ∼200 GeV higher.