Towards constraining triple gluon operators through tops

Effective field theory techniques provide us important tools to probe for physics beyond the Standard Model in a relatively model-independent way. In this work, we revisit the CP-even dimension-6 purely gluonic operator to investigate the possible constraints on it by studying its effect on top-pair production at the LHC, in particular the high $p_T$ and $m_{t\bar{t}}$ tails of the distribution. Cut-based analysis reveals that the scale of New Physics when this operator alone contributes to the production process is greater than 3.4 TeV at 95% C.L., which is a much stronger bound compared to the bound of 850 GeV obtained from Run-I data using the same channel. This is reinforced by an analysis using Machine Learning techniques. Our study complements similar studies that have focussed on other collider channels to study this operator.


I. INTRODUCTION
The Standard Model (SM) has been put through great scrutiny by several collider experiments, like LEP, the Tevatron, Belle, BaBar and the LHC. Intriguingly, except for a few possible anomalies, for instance in the flavour sector (for experimental results, see Refs. [1][2][3][4][5] and for some theoretical works, see for instance, Refs. [6][7][8][9][10][11][12][13][14]), it has held up remarkably well even at energies far above the electroweak scale. This is inspite of tantalising hints to theoretical structures that we are currently unaware of. Despite overwhelming evidence for the existence of Dark Matter, or neutrino mass differences -through experiments like DAMA, LUX , PAMELA, PandaX, XENON and SIM-PLE among others (see Ref [15] and the references therein), along with neutrino studies like BOREXINO, Double Chooz, DUNE, Super-K, MiniBoone, NEXT, OPERA and others (see Ref [16] and the references therein) -no definitive proof of physics beyond the Standard Model (BSM) has emerged. This has further motivated direct searches for popular BSM models, complemented by model independent search strategies. These endeavours have been aided by the enormous amounts of data collected by the CMS [17] and ATLAS [18] experiments, ushering in a new age of ever more precise measurements.
One way to quantify the deviations from the SM is to perform a systematic study of the consequences of all applicable effective field theory (EFT) operators, consistent with the known symmetries. This is quite ambitious and it is usually worthwhile to narrow the focus to a few operators, at a time whose effects may have the greatest chance of showing up at the LHC. One set of such operators are presumably those involving new coloured particles, see for instance Ref. [19][20][21][22][23][24]. A prototypical operator of this nature at dimension-6 is the triple gluon operator where G µν = − i gs [D µ , D ν ] and D µ = ∂ µ + ig s t a A a µ . The operator in Eq. (1) can produce several vertices; among them, the triple gluon vertex will be the one of special interest in this paper. This vertex can be represented as shown in Fig. 1.

FIG. 1: Triple gluon vertex
It is the only pure gluonic CP-even operator at dimension-6 [25]. The CP-odd counterpart of this operator given by OG = f abcG a,µ νG b,ν ρG c,ρ µ whereG µν = 1 2 µνρσ G ρσ , is called the Weinberg operator [26] and is highly constrained by the results of several low-energy experiments, notably by the measurement of the neutron EDM [27].
The operator O G , on the other hand, plays a crucial role in any process involving gluon self-interactions, such as dijet and multi-jet production [28], or Higgs + jets productions such as Hgg [29]. Assuming the absence of any other EFT operator, our Lagrangian is then given by where c G is the Wilson coefficient and Λ is the scale of new physics (NP).
While it might seem that this operator is best probed by looking at the gg → gg scattering process, the helicity structure of the amplitude involving the O G operator is orthogonal to the SM QCD amplitude and thus the two don't interfere with each other at the lowest order, i.e. at O(1/Λ 2 ) [28,30]. The lowest order contribution due to this operator to the matrix element only appears at O(1/Λ 4 ).
The other obvious channel to investigate the operator is the gg → qq process. The amplitudes from the O G operator and from the SM do interfere at O(1/Λ 2 ), and the interference term is proportional to the square of the quark mass, m 2 q . Naturally, this suggests that the best choice for the final state would be the top quark-not only because we gain from its high mass, but also because it is easier to tag compared to the b-quark or the c-quark at the LHC [31,32].
The matrix element squared for the SM contribution, the interference term (which occurs at O(1/Λ 2 )) and the pure EFT operator term (which occurs at O(1/Λ 4 )) for the gg → tt process are [28] Here, (ŝ,t,û) are Mandelstam variables created using the momenta of the initial state gluons and final state top quarks. These expressions can be used to calculate the parton-level differential cross-section w.r.t. the invariant mass of the top pair (m tt ) The integral is over the cosine of azimuthal angle θ, defined as the angle between the beam axis (z-axis) and the momentum direction of the top quark travelling in the direction of +z axis. The normalised differential crosssection is plotted in Fig. 2 and it shows the behaviour of the SM contribution compared to the contributions of the interference and the purely NP terms.
From Eqns. (3)- (7), as well as from Fig. 2, it is clear that the contribution of the NP term arising purely FIG. 2: Plots of normalized parton-level differential cross-section with resepct to m tt calculated for the process gg → tt -using SM, interference and purely NP matrix element terms. The intercept of the curves on the x-axis is at 2mt.
FIG. 3: Some of the tree-level Feynman diagrams for the pp → tt process (with no additional hard jets), in the SM, are shown in the subfigures (a) -(c). Please refer Appendix A for the full list of relevant diagrams. The last diagram (d) shows the same process, but now with one insertion of the O G operator; where the operator insertion is indicated by the filled square at a vertex. Note that this is the only such possible NP diagram. from the O G operator increases with energy and, at high enough energies, can compensate for the suppression from the extra powers of Λ. Of course it is not possible to access arbitrarily high energies in this framework, since the validity of our EFT approach would eventually break down.
The tt production process has been widely studied at the LHC. To be concrete, we obtain the data relevant to our analysis from Ref. [33], by the CMS collaboration.  The study looked at the production of top quark pairs along with additional jets, in events with lepton+jets. The reference provides unfolded distributions of various kinematic variables, like those of p T and m tt among others, in terms of parton level top quarks. This involves reconstruction of the final state and unfolding of the obtained data. While unfolding removes the effects of the detector on the data to a large extent, reconstruction provides us the information of the undecayed top quark state at the parton and particle level. This enables us to perform our analysis using parton-level top quark states. We will therefore follow this as our prototypical source.
Our analysis requires us to generate samples of tt + 0j and tt + 1j, where j represents an additional hard jet in the final state arising out of a hard parton at the matrix element level. The samples are generated in Mad-Graph5 aMC@NLO v2.6.7 (MG5) [34] using UFO files made using FeynRules v2.3.32 [35]. Since a partonic level analysis was carried out, there was no need for showering.
One of the channels that we utilise will be the pp → tt + 0j process. Some examples of the SM Feynman diagrams which contribute to this process are given in the (a)-(c) subfigures of Fig. 3, whereas the insertion of the O G operator generates the diagram in the last subfigure. If we include a hard jet in the final state, we have two classes of NP Feynman diagrams -one with only one insertion of O G and the other with two such insertions -both of which contribute to the final state amplitude for pp → tt + 1j. Some examples of the former class can be found in Fig. 4, while the rest have been listed in Appendix A. All diagrams from the latter class can be found in Fig. 5. Also note that, unlike in the case of exclusive tt production, in addition to di-gluon initial states, the qg and qq initial states also contribute to the production cross-section of tt + 1j, where q is a quark from the proton, most often the u or d. These additional subprocesses contribute to the increase of sensitivity when one includes additional hard jets in the process.
The addition of a hard jet (viz. jets arising from hard partons at the matrix level) in the final state is expected to change the differential cross-section distributions. SM events of this type can be generated at the NLO using MG5 aMC@NLO which are then merged in Pythia v8.243 [36] using FxFx matching algorithm [37] before showering 1 . The normalised differential cross-section of the SM NLO events, one w.r.t. the transverse momentum of the hardest top (p T (t high )) and the other w.r.t the invariant mass of the top pair (m tt ), are plotted in Fig. 6.
Our choice of tt production is motivated primarily by the fact that the final state is quite clean. As we shall see in the next section, this channel yields a bound on the scale of NP of Λ/ √ c G > 3.6 TeV, which is a significant improvement over the bound of Λ/ √ c G > 850 GeV obtained using Run-I data [38,39] for the same channel. An independent method to constrain the scale of NP for this operator would be to use multijet final states. This was done in Ref. [23] and more recently scrutinised in Ref. [40], leading to somewhat stronger bounds than the ones we obtain. Our goal in this paper is to investigate the bounds that the clean and complementary channel of tt production yields. Our paper is arranged as follows -in Sec. II, we use the MC events from the pp → tt process to estimate the value of each contribution to the cross-section. This is then used in a chi-square analysis to put constraints on the value of the NP scale. We also explore how the addition of a single hard jet to this process changes our constraint. In Sec. III, we then employ two different machine learning techniques to try and improve the bounds that we obtained in the previous sections. We summarise and conclude in Sec. IV.

II. BOUND FROM pT AND m tt DISTRIBUTION
Considering pp → tt with no additional partons in the hard process (pp → tt + 0j) (which entails the insertion FIG. 6: Binned plots of the normalised SM NLO differential cross-section, with respect to p T (t high ) (top) and m tt (bottom), for tt + jets. The solid line is for tt + upto 1j and the dashed line is for tt + upto 2j. The figures illustrate the change in the shape of the distribution, when an additional hard jet is added to the final state.
of upto one NP vertex as indicated in the representative diagrams in Fig. 3), the total cross-section can be written as the following sum of the different contributions : One can use MG5 to compute each of the terms in the cross-section exclusively for one or two insertions of the O G operator at the cross-section level, thus obtaining a measure of the cross-section contribution of each term to the total cross-section for a particular value of c G and Λ. The only Feynman diagram with the O G operator that contributes to this process is shown in the bottom row of Fig. 3. The interference of this diagram with the SM ones gives rise to the O(1/Λ 2 ) term and the square of the amplitude of this diagram gives rise to the O(1/Λ 4 ) term.
The Monte Carlo event generation in MG5 uses the following selection criteria and parameters: p jet T > 20 GeV, η jet < 2.5, m t = 173 GeV, PDF set = nn23lo1, and dynamical scale choice = 1 which means that the dynamical scale used for factorisation and renormalisation is set equal to the total transverse energy of the event.
The cross-section term with n insertions of the O G operator is generated using the MG5 addendum NP^2==n QCD<=99 QED==0. We also use this syntax while considering additional jets later in this section with appropriate values of n. We keep the values of c G and Λ unchanged for the event generation.
We will obtain the bounds on the scale of NP using the binned transverse momentum distribution of the hardest top, (p T (t high )), and the binned distribution of the invariant mass of the tt pair, (m tt ). First, we will compute these for the pp → tt + 0j process.
where t high refers to highest p T top quark. The cross-section σ tt, SM is calculated from events generated using MG5, showered in Pythia8 using the FxFx merging algorithm. The total SM cross-section has been normalised to the latest theoretical prediction of 832.0 pb [33,[41][42][43]. The cross-sections σ tt, Λ 2 and σ tt, Λ 4 have been calculated using the process pp → tt (no additional hard jets) in MG5. The σ Exp values are taken from   Table  13 in Ref. [33] suitably adjusted for binwidth and branching ratio.
To this end, in Table I, the values for the binned parton-level cross-section are shown for each of the terms in the pp → tt cross-section using c G = 1, Λ = 1 TeV. All cross-section values are at LO, except for the SM crosssection, which is calculated at NLO as described in the previous section. It is worth noting that the values of c G and Λ used for event generation are chosen for reference only. Starting from these values, one can easily scale to other desired value of these parameters. Also shown in the table are the values of the experimental cross-section obtained from Table 7 of Ref. [33]. The values of the cross-section in the SM column have been scaled so that the total cross-section comes out to be 832.0 pb, which is the theoretically predicted inclusive tt cross-section [44]. Similarly, in Table II, the values for the parton-level cross-section for pp → tt binned in certain ranges of the m tt variable are shown. Here, again the SM column has been scaled to a total of 832.0 pb and the values of the last column showing the expected cross-section are taken from Table 13 of Ref. [33].  Table I. FIG. 8: Plot of ∆χ 2 as a function of c G /Λ 2 with and without the theoretical uncertainty (∆σ th ). Data used is the cross-section binned in m tt given in Table II.
We can calculate the value of the χ 2 as a function of c G /Λ 2 . The relevant formulae we utilise for the calculations are given by Here, ∆σ i is the total uncertainty on the cross-section in the i th bin obtained by adding the statistical uncertainty (∆σ i stat ), the systematic uncertainty (∆σ i sys ) and the theoretical uncertainty (∆σ i th ) in the bin in quadrature.
The calculation of the χ 2 using the formula in Eqn. 9 entails the use of the total tt production cross-section.
We can either use the MG5 aMC@NLO cross-section or the central value of the experimental cross-section as the value of the SM cross-section. This provides us with 'observed' and 'expected' bounds respectively.
The theory uncertainty is assumed to be ∆σ th = 5%, of the central value of σ Exp in each bin, since the relative theory uncertainty on the theoretically calculated total cross-section (832.0 pb) is also of the same order.
Subtracting the value of the curve at its minima, we obtain the plot for ∆χ 2 as a function of c G /Λ 2 . The curves obtained for ∆χ 2 using the binned p T (t high ) data are shown in Fig. 7 and the ones obtained using the binned m tt data are shown in Fig. 8.
Given  Table III.   Fig. 7 and Fig. 8, after including ∆σ th . The cutoff for ∆χ 2 is χ 2 cut = 19.68 for the p T (t high ) case and χ 2 cut = 16.92 for the m tt case.
As expected, the bounds we get on Λ/ √ c G using all the terms upto O(1/Λ 4 ) in the cross-section are stronger than those obtained using only terms upto O(1/Λ 2 ). Note that we get similar bounds using the p T (t high ) and m tt distributions. Also note that the 'observed' bound is stronger than the 'expected' bound. This is due to the fact that in most bins in Tables I and II, the value of the SM cross-section obtained from MG5 is larger than the central value of the experimental cross-section. This leads to a greater pull away from the experimental value (which is used in the calculation of the χ 2 ) and thus results in a stronger bound.

Additional jets
Let us now turn our attention to scenarios with additional jets in the final state. We expect to obtain stronger bounds by including additional QCD hard jets arising from extra partons in the hard event in the tt final state, due to additional operator insertions possible.
Consider the process in which the final state has upto one additional jet, pp → tt + upto 1j. In this process, we can have a maximum of two insertions of the O G operator, as may be seen explicitly from Figs. 4 and 5. As is clear from these figures, additional production channels with new initial states open up, viz. the qg and the qq initial states. The cross-section for the exclusive process with one additional jet in the final state can be written as For the process of interest (pp → tt + upto 1j), certain terms in Eqn. 10 receive contributions from the exclusive tt production (i.e. no additional hard jets in final state) as well as the process with one additional hard jet. From Eqn. 8, one can see that the exclusive process contributes only upto O(1/Λ 4 ), whereas the latter contributes to all terms upto O(1/Λ 8 ). Thus, the O(1/Λ 6 ) and O(1/Λ 8 ) contributions come purely from the process with an additional jet.
The effect of adding a QCD jet to the final state of tt process for various values of Λ can be visualised by the plots in Fig. 9 which show the normalised differential cross-sections as a function of p T (t high ) and m tt . Note that the higher valued bins show more deviation from the exclusive tt shape than the lower valued bins for both the variables. This is expected since higher powers of energy (or momentum) occur in the numerator of the higher order terms in Eq. (10). Thus, at higher energies these terms contribute more, leading to the gain in the total cross-section of this process, compared to the exclusive tt process. Also note that the deviation decreases as the value of Λ used for the generation of the events gets larger. This is also expected for the simple reason that the scale suppression in each of the terms in the crosssection (barring the SM term) increases with increasing values of Λ.
In Table IV, the contribution of the different terms in the total cross-section binned in the variable p T (t high ) is shown. Note that this is the cross-section of the inclusive process pp → tt + upto 1j. Similarly, in Table V, the binned cross-section contributions of the different terms have been tabulated for the variable m tt for the same sample.
The data from these two tables are again used to calculate ∆χ 2 (= χ 2 − χ 2 min ) as in the previous analysis. This is done in two ways -first, by using all the bins and, then, using the data from only the last four bins from each table. Furthermore, we calculate the ∆χ 2 using cross-section upto a certain order in the cross-section, e.g. upto O(1/Λ 2 ), upto O(1/Λ 4 ) etc.
From the plots in Figs. 10 and 11, the bounds on Λ/ √ c G can be calculated. They are tabulated in Ta-    found from the plots of Fig. 10, after including ∆σ th . The cutoff for ∆χ 2 is χ 2 cut = 19.68 for the p T (t high ) plots and χ 2 cut = 16.92 for the m tt plots when all the bins are taken into account and χ 2 cut = 7.82 for both cases when only the last four bins are taken.
bles VI and VII. Similar to the earlier scenario without any additional hard jets in the final state, the observed bound of Λ/ √ c G > 3.6 TeV obtained by using p T (t high ) is stronger than the Expected bound of about Λ/ √ c G > 2.3 TeV. Furthermore, these bounds are considerably stronger than the bounds obtained from the tt + 0j process shown in Table III for both the expected and observed cases. This is because the pp → tt+upto 1j process involves a higher order of the NP contributions (upto O(Λ −8 )) as compared to the pp → tt + 0j process  Tables I and II is used as the SM contribution to the total NP cross-section.
(upto O(Λ −4 )). Moreover, as evidenced by the list of Feynman diagrams, more subprocesses contribute to the pp → ttj process compared to the pp → tt process in both the SM and the NP contexts.  Tables I and  II is used as the SM contribution to the total NP cross-section.
Note also that the bounds calculated using the last four bins are somewhat stronger than (or almost equal to) the bounds obtained by using the data from all the bins. This is consistent with our earlier observation that  found from the plots of Fig. 11, after including ∆σ th . The cutoff for ∆χ 2 is χ 2 cut = 19.68 for the p T (t high ) plots and χ 2 cut = 16.92 for the m tt plots and χ 2 cut = 7.82 for both cases when only the last four bins are taken.
the contribution of terms from the O G operator to the tt production grows with energy. Thus, by focussing on the higher p T (t high ) or m tt bins, we gain a bit on the bounds on Λ/ √ c G . It should be noted here that the bounds depend on the theoretical uncertainty which has been taken to be ∆σ th = 5%. The bounds lowers by 15 − 20% as the uncertainty is increased to 15%.

III. USING MACHINE LEARNING TECHNIQUES
We would now like to explore if any gains may be obtained by augmenting the analyses of the previous sections using machine learning techniques. To this end, we explore two methods to improve the reach. The first method is a Dense Classifier and it uses data passed to it event-by-event. This will be referred to as 'Event-based analysis'. The second method is to distill the information of the events in certain bins and form 'images' which are then fed into a Convolutional Neural Network (CNN) classifier. We will refer to this hereafter as the 'Bin-based analysis'. The details for each of these procedures are given in the subsections devoted to each of the analyses.
The Neural Networks (NNs) used are constructed and trained in Keras [46] with a Tensorflow [47] backend.

III.1. Event-based Analysis using a Dense Classifier
The events of the process pp → tt (upto 1 hard jet) to be used in this analysis are generated in MG5. We obtain samples for the SM as well as NP models with Λ = 2, 3, 4, 5 TeV. A cut of 50 GeV is applied on the p T of the additional jet at the generational level. Apart from this, a lower cut of 400 GeV is applied on the p T of the top as well as a cut of 1000 GeV on the invariant mass of the top pair. These cuts are motivated by the fact that there is significant difference between the SM samples and the NP samples at high p T and m tt bins. While the samples generated using purely SM physics is called the 'SM sample', the NP sample containing events with both SM and NP contributions is called the 'Full sample'.
Each event in the samples is characterised by four variables -p T (t high ), p T (t low ), ∆y(= |y(t high ) − y(t low )|), m tt . The samples used for this analysis involve scaling the generated data using the RobustScaler from the SciKit-learn package [48]. This ensures that outliers in the data don't affect the scaled data. After scaling, each of the variables is centred around zero and has an interquartile range of one. The classifier is trained on the SM and each of the Full samples where the SM data is labelled by '0' (zero) and the Full sample data is labelled by '1' (one).
The classifier comprises of four Dense layers with 512, 256, 128 and 1 neurons respectively. The activation function of all the layers except for the last one is the Rectified Linear Unit (ReLU) while for the last layer, softmax activation is used. The network is trained with an Adam optimizer using a learning rate of 0.001. Binary crossentropy is the loss function used for the network.
The classifier is trained on 240,000 events, validated on 80,000 events and tested on 80,000 events, where these events are drawn from both the SM and Full samples. The results of the testing are shown as Receiver Operating Characteristic (ROC) curves. The training is done separately for each value of Λ. As the ROC curves in Fig 12 demonstrate, the network can distinguish between an SM sample and an NP sample with Λ = 2 TeV, as well as an NP sample with Λ = 3 TeV, but the efficiency falls drastically. This is also demonstrated by the Area Under the Curve (AUC). However, the NP samples with Λ = 4 TeV and Λ = 5 TeV are barely distinguishable from the SM sample. The Dense classifier can thus only be used to differentiate NP samples upto Λ = 3 TeV from the SM. This is consistent with the expected bound which we obtained in the previous section.

III.2. Bin-based Analysis using a CNN Classifier
Instead of using individual events for our analysis, we can use a number of events populating each bin and derive an 'image' out of these for our analysis. This is what we attempt in this section. An 'image' in this context is simply a matrix of numbers suitably scaled and this represents a normalized 2D histogram in matrix form. Our images are all monochromatic in nature, i.e. the number of colour channels is one. Since Convolutional Neural Networks (CNNs) are well-equipped to handle images, we shall use a CNN classifier in this analysis.
Events from the pp → tt + 0j process are used to achieve this. This process is chosen to enable us to later make comparisons with the numbers given in Table 16 in Ref. [33]. The value of the transverse momentum quoted in the table is that of the hadronically decaying top quark, which is denoted in the table as p T (t h ) and we also adopt this denotation. This top quark is not necessarily the one with the highest p T . Since we use parton-level generated events, this is a difficult criterion to meet. It is met only if the two hardest top quarks in the event have the same p T , which is possible only in the pp → tt process.
Each image we use is made up of a large number of events; we choose to have 50,000 events for each image. The CNN classifier will be trained and validated on images formed from Monte Carlo (MC) events generated in MG5. We shall construct four such models, each for one value of NP scale, Λ. These models will then be tested on some pseudo-data generated using binned double differential cross-section (d.d.c.s.) values given by the CMS collaboration. Each of final images to be used for the CNN is made from 50,000 events. The network is trained on 25 such images, validated on 10 more images and finally tested on 15 images.

The CNN
The CNN comprises of two 2D convolutional layers with 16 and 32 output channels with a kernel size of 2 × 2 followed by one 2D convolutional layer with 64 output channels and a kernel size of 1×1. Dropout is applied to the output of this convolutional layer to prevent overtraining. The output is then flattened into a 1D array which is passed through two Dense layers made up of 128 and 50 neurons and then through a Dense layer with one neuron to obtain the classifier output. The activation function used in the first two Dense layers is ReLU, while for the final layer, softmax is used. The Adam optimiser with the default learning rate of 0.001 is used and the preferred loss function is Binary cross-entropy.
Three different CNN classifiers are trained and validated. The images used for training are derived from SM events and NP events. The three CNN classifiers correspond to three values of Λ = 2, 3, 4 TeV. The classifier trained to differentiate between SM events and NP events with Λ = 2 TeV is called CNN 2TeV and the rest are named in a similar way.

Prediction and Reach
After training, these CNN classifiers are used for prediction using pseudodata generated from binned d.d.c.s. given in Table 16 in Ref. [33]. The appropriate binned d.d.c.s. is used to obtain the number of observed events in that bin, assuming the integrated luminosity to be 35.8 fb −1 . We derive the central value for the bins by scaling an average normalized SM image obtained from MG5 to this obtained number of events. Assuming a Normal Distribution with mean at these central values and 1 σ error derived from the same d.d.c.s., we populate the bins with randomly drawn events with the values of the parameters within this range. For each of the samples (corresponding to different values of Λ), both 1 σ and 2 σ errors are considered.
We choose to generate 1000 images from this data for our CNN prediction, each for the three NP classes of events (Λ = 2, 3, 4 TeV) and for the two error bands (1 σ and 2 σ). We can then note the probability with which the CNN models predict that these images are purely SM, i.e. no NP with that particular value of Λ, in each of the cases. This is done for both 1 σ error (giving us the probability P 1σ SM ) and 2 σ error (giving us P 2σ SM ). The results of our prediction are given in Table VIII. The results in   Table showing the results of the CNN classifier when it is used to make a prediction on the pseudodata generated from the binned double differential cross-section given by CMS (from Table 16 in Ref. [33]). The numbers show the probability that the images used for prediction are purely SM with no NP. Refer to text for more details.
used for prediction are SM with 97.29% probability for 1 σ error. This essentially rules out the occurrence of NP with Λ = 2 TeV at that same level. This confidence drops to 80.26% when the 2 σ error range. This tells us that the scale of any New Physics is greater than 2 TeV. This changes when we use the second classifier model CNN 3TeV, trained to differentiate between SM and NP events with Λ = 3 TeV. It provides a probability for the data being SM only about 50% of the time, which is nothing more than a random choice between SM and NP. Thus, we cannot rule out the occurrence of NP with Λ = 3 TeV at either 1 σ and 2 σ. The same can be said for the third classifier model CNN 4TeV. This tells us that the reach of this approach is somewhere between Λ = 2 TeV and Λ = 3 TeV. This is consistent with our findings with the Dense classifier and offers an improvement over the bounds obtained using the χ 2 analysis in the previous section for the case of tt final state with no additional hard jets.

IV. CONCLUSION
The triple-gluon operator is the only CP-even purely gluonic dim-6 effective field theory operator and is given by O G = f abc G a,µ ν G b,ν ρ G c,ρ µ . Among other vertices, it contributes to the triple-gluon vertex. In a New Physics context, such operators can arise from the presence of new heavy coloured particles or, more generically, when a gluonic form factor is present. Since any process involving gluon self-interactions will be affected by the presence of this operator, several processes such as dijet or multijet production processes or Higgs-associated production processes such as Hgg can be studied to probe it.
Not all dijet processes are equally efficient at constraining the operator. The amplitude for the gg → qq process interferes with the Standard Model at the O(1/Λ 2 ), compared to gg → gg production which doesn't interfere with the Standard Model at the lowest order. Moreover the amplitude of the gg → qq scales proportionately with the square of the quark mass. These two properties lead us to choose the top quark pair production process as our probe of choice. The choice is further bolstered by the fact that the multiple b-quarks are produced as decay products of the two top quarks and b-jets can be tagged with high efficiency at the LHC.
In this work, we employ a cut-and-count method. For this, we individually estimate the contributions to the cross-section due to the Standard Model alone, due to the New Physics operator alone and due to the interference between them. This is done for the pp → tt process, firstly with no additional hard jets arising from a hard parton at the matrix level, and then by including one such jet. After performing a χ 2 analysis, we find that the strongest constraint obtained at 95% C.L. for the scale of New Physics of this operator is Λ/ √ c G > 3.6 TeV.
Two different kinematic variables -p T (t high ) and m ttwere used for binning the cross-section data used in the χ 2 analysis. Additionally, we investigate the New Physics bounds using machine learning techniques using tt events with no addtional final state jets. To this end, we employ two classifiers -one a Dense Neural Network and the other a Convolutional Neural Network -to differentiate between the purely Standard Model events and those with contribution from the triple-gluon operator. We choose four benchmarks corresponding to Λ = 2, 3, 4, 5 TeV and note that the limits obtained using the classifiers are a slight improvement over the cut-and-count analysis using the tt final state with no additional jets.