Detecting anomalous quartic gauge couplings using the isolation forest machine learning algorithm

The search of new physics~(NP) beyond the Standard Model is one of the most important tasks of high energy physics. A common characteristic of the NP signals is that they are usually few and kinematically different. We use a model independent strategy to study the phenomenology of NP by directly picking out and studying the kinematically unusual events. For this purpose, the isolation forest~(IF) algorithm is applied, which is found to be efficient in identifying the signal events of the anomalous quartic gauge couplings~(aQGCs). The IF algorithm can also be used to constraint the coefficients of aQGCs. As a machine learning algorithm, the IF algorithm shows a good prospect in the future studies of NP.


I. INTRODUCTION
Despite the great success of the Standard Model (SM), there are still many unanswered questions making the search for new physics (NP) beyond the SM a very important issue [1].
Except for a few cases that are known or suspected to deviate from the SM [2], in most cases, the experiments are in good agreement with the SM.The searching of NP is to look for a small number of anomalies in the vast amount of experimental data.Meanwhile, the kinematic features of the events induced by NP are usually different from the SM, which is the reason that event selection strategies (ESSs) are commonly applied in the phenomenological studies of NP.From the perspective of the SM effective theory (SMEFT) [3], that is because the signals are induced by new interactions different from the SM.It follows that the search of NP is to search the events which are 'few and kinematically different'.In this paper, we make use of the above features of NP and use a model independent strategy to directly search for kinematically unusual events.This strategy has the advantage that it can be applied generally.Taking the SMEFT for example, which is also a model independent way to search for the NP signals.For one generation of fermions, there are 895 baryon number conserving dimension-8 operators [4,5], and kinematic analysis needs to be done for each operator.Compared with the SMEFT, our strategy is independent of the operators.Except for that, unlike the search of NP in a process which may turn out to be a wasted effort, the kinematically unusual events are always worth attention even if they are not from NP.They could be faults in the experiments, or could be the rare processes allowed by the SM.
As an example, we use our strategy to study the anomalous quartic gauge couplings (aQGCs) [6], which are modifications to the SM gauge interactions intensively studied [4,7,8].aQGCs can be contributed by a lot of NP models such as Born-Infeld theory, composite Higgs, warped extra dimensions, two Higgs doublet models, U (1) Lµ−Lτ , and axion-like particles [9,10].
Since dimension-6 operators cannot contribute to aQGCs independently [4], we concentrate on the dimension-8 operators.A recent study shows that the dimension-8 operators are important in the convex geometry point of view to the SMEFT space [11].Besides, there are cases sensitive to dimension-8 operators because the contributions from dimension-6 operators are absent [9,12,13].In the case of anomalous gauge couplings, aQGCs can lead to richer helicity combinations than dimension-6 anomalous trilinear gauge couplings (aT-GCs) [14].Besides, aQGCs can originate from tree diagrams while dimension-6 aTGCs are generated by loop diagrams [15].Consequently, while the SMEFT has mainly been applied with dimension-6 operators, the importance of dimension-8 operators has been pointed out in many previous studies [5,6,12].
The search of 'few and kinematically different' events is in fact an anomaly detection (AD), the applications of which in high energy physics (HEP) are under developing extensively recently [16].AD is suitable for machine learning (ML), which has been used in various aspects of HEP [17].When it comes to AD, there are many algorithms such as autoencoder [18][19][20], multivariate Gaussian mixture model [20,21], deep support vector data description [19,22], and isolation forest (IF) [19,20,23].We use the IF algorithm because the mechanism behind IF algorithm is transparent, it merely identifies the points which are few and far away from the others.Moreover, it is expected to perform better with fewer signal events, it is efficient to apply and easy to implement.We find that the IF algorithm works as an automatic ESS and can identify the signal events very well.Besides, IF algorithm can also be applied to constraint the parameters of NP models such as the coefficients of aQGCs, therefore has a lot of potential in the future studies of NP.
The rest of the paper is organized as follows, in Sec.II we briefly introduce the IF algorithm; in Sec.III, the application of IF algorithm on detecting the signals of aQGCs is presented; Sec.IV is a summary.

II. A BRIEF INTRODUCTION OF ISOLATION FOREST
IF algorithm is an algorithm with linear complexity designed for detection of point anomalies.It makes use of the fact that the anomalies are 'few and different'.It can be applied for multi-dimensional data efficiently.We briefly introduce the IF algorithm following Ref.[23].
The key step of IF algorithm is to build an ensemble of isolation trees (ITs).The IT is a binary tree structure randomly generated to isolate every single point.Denoting each point in the data set as p i (x i 1 , x i 2 , . . ., x i D ), the construction procedure of an IT can be summarized as follows: 1. Put all points into a root node.
2. Randomly select a node which has not been partitioned yet.
4. Randomly set a split value min(x i d ) < x < max(x i d ) where i runs over all points in this node.
5. Generate two children nodes, put the points with x d < x into the left child, and the others into the right child.
6. Repeat (2) to (5) until every node is either partitioned or is filled with only one point.
In this paper, we do not set a maximum depth for the ITs.When an IT is generated, and the path length from a leaf to the root node can be used to determine whether the point represented by the leaf is an anomaly.The path lengths of anomalies are generally shorter than those of normal points.
Because an IT is constructed randomly, it can be expected that the path lengths of points are not stable for a single IT.Therefore, it will be more convincing to introduce multiple ITs, together as an IF.Then the average path lengths over the ITs can be used to discriminate the anomalies from the normal points.
There are only two variables in this algorithm: the number of ITs, and the size of the data set.As will be shown latter, the two variables can be made irrelevant of the problem.
More details and extensions of the IF algorithm can be found in Refs.[23,24].

III. APPLICATION OF IF ALGORITHM ON THE SEARCH OF AQGCS
The IF algorithm can be applied in many different NP models.In the absence of clear signs for NP, we use the detection of aQGCs as an example.
A. aQGC signals in the process pp → jjℓ + ℓ − ν ν The vector boson scattering (VBS) processes at the LHC are very suitable for searching for the existence of aQGCs [4,25].They have been extensively studied by both ATLAS group and CMS group, and the effort will continue with future runs of the LHC.After the first evidence of VBS process at the LHC found in 2014 [26], a number of experimental results of VBS processes have been obtained [14,27].
Recently, the evidence of exclusive or quasi-exclusive γγ → W + W − process has been found [28].As an illustration, we concentrate on this process at √ s = 13 TeV.The nextto-leading order QCD corrections to the process pp → W + W − jj have been computed [29], and the K factor is found to be close to one (K ≈ 0.98).There are some difficulties in the phenomenological studies of NP in this process because of the presence of two neutrinos in ℓ + ℓ − ν νjj which makes the reconstruction of the two W bosons almost impossible.However, these difficulties just provide a good test for the IF algorithm.
The Lagrangian relevant to this process is The subprocess γγ → W + W − can be affected by the aQGCs via five vertices, they are where The corresponding coefficients of vertices are Because each dimension-8 operator contributes to only one vertex, and because the constraints on dimension-8 operators are obtained by assuming one operator at a time in experiments, the constraints on α i can be derived by the constraints on dimension-8 operators [8] and are listed in Table I.TABLE I: The constraints on vertices and the corresponding limits on the dimension-8 operators at 95% CL.

B. Detection of the signals
In this subsection, we assume the existence of the aQGCs and investigate whether the signal events can be picked out by IF algorithm.The dominant signal is W + W − jj production induced by aQGCs with leptonic decays of W ± bosons as shown in Fig. 1. (a).This process can also be contributed by the triboson production induced by aQGCs shown in . . .2: Typical Feynman diagrams for backgrounds.mistagged as dipicted in Fig. 2. (b), the b-tag efficiency is assumed to by 77% [32].For simplicity, we neglect the triboson channel induced by aQGCs and the interference between the contributions from aQGCs and the SM which were found to be negligible [8].In the following we consider one operator at a time, therefore the interferences between different aQGCs are also neglected.
The events are generated by using Monte Carlo (MC) simulation with MadGraph5_aMC@NLO [33], including a parton shower with Pythia82 [34] and a CMS-like detector simulation with Delphes [35].The basic cuts are set as same as the default settings of MadGraph5_aMC@NLO.
To ensure the reliability, we require the particles in the final states to satisfy N ℓ ± ≥ 1, 2 ≤ N j ≤ 5 where N ℓ ± are the numbers of (anti)leptons, N j is the number of jets.
This requirement is denoted as N ℓ,j cut.after the N ℓ,j cut, and therefore the data sets are consist of events with N SM : N t t : N V 0 = 16846 : 23654 : 50 and N SM : N t t : N V 3 = 37390 : 52500 : 50.Each event in the data set is assembled straightforwardly and is consist of 18 attributes, which are components of transverse missing momentum / p T , the 4-momenta of the hardest two jets p j 1 and p j 2 , and the 4-momenta of the hardest (anti)lepton p ℓ + and p ℓ − .
There are two parameters in the IF algorithm.One of the parameters is the number of trees, which is denoted as n.n is a model-independent parameter used to control the accuracy of the IF algorithm.Denoting the path lengths as L, we find that L converges quickly with growing n.Picking one event out of each of the SM background, t t background and V 0 signal, as shown in Fig. 3, L becomes stable after constructing about 1000 trees.In this paper, we use n = 2000, the relative standard errors of L are about 1% (0.4% − 1.4%) for each point.
The other parameter is the size of the data set.An anomaly score (denoted as a) which is independent of the size of the data set can be defined by normalizing the average path length (denoted as L) with the average depth of an isolation tree c(N ) as a = 2 − L/c(N ) , where N is the size of the data set.[23], where H(N ) is the harmonic number.a is bounded in (0, 1).When a is larger, the corresponding event is more suspicious of anomalies.
The normalized distributions of a are shown in Fig. 4. We find that in both cases of V 0 and V 3 , a for the backgrounds are very different from those for the signals.One can set a minimal anomaly score, and use a > a min to pick out the signal events of aQGCs.The compositions of the selected events are shown in Fig. 5.For both cases, with a min = 0.6, about half of the selected events are signal events.We find that the IF algorithm is powerful to pick out the signal events without the knowledge of the NP as long as the signal exists.

C. Use the IF algorithm as an event selection strategy
The effect of the IF algorithm is similar to an event selection strategy (ESS).Different from the traditional ESS, for the IF algorithm there is no need to study the kinematic In the search of NP, the signal significance is widely used which is defined as S stat = N s / N bg + N s where N s,bg are event numbers of signal and background.Similarly, a luminosity independent quantity can be defined as Ŝstat = σ s / √ σ bg + σ s such that S stat = √ l Ŝstat where l is the luminosity.In this paper, we use Ŝstat to qualify the ESS.By selecting events with a > a min , Ŝstat for V 0,3 are shown in Fig. 6.The Ŝstat can reach 0.630 fb 1/2 at a min = 0.617 for V 0 and 0.425 fb 1/2 at a min = 0.642 for V 3 .
We compare the IF algorithm with the ESS designed for the aQGCs in the process pp → jjℓ + ℓ − ν ν proposed in Ref. [8], which are where M jj and ∆y jj are invariant mass and difference between the rapidities of the hardest two jets, ϕ LM is the angle between sum of the transverse momenta of charged leptons T and / p T , θ ℓℓ is the angle between the charged leptons, and with E ℓ ± the energies of charged leptons, and For IF algorithm, we select events with a > 0.617 and a > 0.642 for V 0 and V 3 , respectively, the result sets are denoted as S IF .The sets consisting of events selected by Eq. ( 5) FIG.7: Difference between the selected events by using Eq. ( 5) and using anomaly scores.
The blue bars show the number of events selected by Eq. ( 5) but not by anomaly score cuts, the yellow bars show the number of events selected by anomaly score cuts but not by Eq. ( 5), the orange bars show the number of events selected by both methods.
are denoted as S ESS .The numbers of events in those sets are shown in Fig. 7.As one can see that the events picked by IF algorithm is not quite the same as the ESS in Eq. ( 5), especially for the backgrounds.The Ŝstat for V 0,3 with Eq. ( 5) are 0.691 fb 1/2 and 0.341 fb 1/2 .
Compared with the results of IF algorithm, we find that the ESS using anomaly score shows competitive ability in discriminating signals, especially for the cases that the signal events are fewer.
In the above we using 18 attributes which is straightforward, but not optimized.There are usually observables more sensitive to the signal, which depend on the model or operators one looks for.For example, knowing that we are searching for aQGCs, we can use attributes such as M o1 , | / p T | and p ℓ + •p ℓ − .By choosing only two attributes, | / p T | and p ℓ + •p ℓ − , the events can be represented by points in a 2D space, and therefore easy to visualize.By applying IF algorithm on these attributes, the distributions of events with different anomaly scores are shown in Fig. 8, and one can see that the events with the higher anomaly scores are indeed those events far away from the others.The distributions of the events from backgrounds and signal are also shown in Fig. 8, which indicate that the events far away from the others are indeed the signal events.

D. Set constraints on the coefficients
A more common scenario is that signal events are not observed and one needs to set constraints on the parameters of NP models and coefficients of operators.This can also been done with the help of the IF algorithm, because the mechanism behind the IF algorithm suggests that the anomaly scores of the backgrounds should not be sensitive to the signal events.Consequently, after constructing an IF for the MC data of the backgrounds, which is model independent, one can use anomaly score as a cut, the expected cross section after this cut can be calculated, and can be compared with the cross section obtained by experiments under the same cut.However, when it comes to constraint the parameters of a specific model, we need the information of this model which is not model independent any more.
Denoting a 0,50 as anomaly scores of events for the N aQGC = 0 and N aQGC = 50 data sets, respectively.The distributions of a 0 − a 50 for the backgrounds are shown in Fig. 9.We find that, the anomaly scores of the backgrounds increase a little bit without the signal events.
For V 0 , 0 < a 0 − a 50 < 0.075 and for V 3 , −0.005 < a 0 − a 50 < 0.065.Since the anomaly scores for the backgrounds increase a little, we use a min = 0.68 for V 0 as a cut, and a min = 0.70 for V 3 .The cross sections after this cut are shown in Fig. 10.Using the cross sections, one can obtain the signal significance, which is

IV. SUMMARY
As more and more data are collected on the colliders, it becomes increasingly important to simplify the search of NP signals.In this paper, we investigate a model independent approach for searching the NP signals which exploits the characteristics of the NP signals: few and kinematically different.We use an unsupervised ML algorithm, also known as the IF algorithm, to find out the kinematically unusual events directly.
The IF algorithm is transparent and easy to apply.This approach has the advantage that the suspected signals of the NP can be picked out without the knowledge of the NP models.It works as an automatic ESS which can be generally applied.We also show that the IF algorithm can be applied to constraint the parameters of NP models and coefficients of the operators.Apart from that, the kinematically unusual events picked out are always worth studying.There are also some limitations in this approach.When anomalies appear, one needs to look deeper into them to know where they are originated.Beyond that, there is room for improvement in how the data is organized.In this paper, we directly use the components of the 4-momenta of the particles in the final state.
We use the dimension-8 operators contributing to the aQGCs as examples to investigate the capabilities of this approach.The process pp → jjℓ + ℓ − ν ν is chosen as an arena, which has some complexity due to the neutrinos in the final state.It can be shown that the anomaly scores of the background events are generally smaller than those of the signal events.With a minimal allowed anomaly score as a cut, the signal events can be selected efficiently.The IF algorithm shows greater ability to highlight the signal events and constraint the coefficients of the operators compared with the ESS designed for the aQGCs in this process.In addition, we also show that IF algorithm performs better with fewer signal events.The IF algorithm or other machine learning methods can be a very promising tool in the future study of high energy physics.

Fig. 1 .FIG. 1 :
Fig. 1. (b).The background is the process pp → jjℓ + ℓ − ν ν in the SM, the typical diagrams are shown in Fig. 2. (a).Except for that, we also consider the t t production with b-jet

FIG. 3 :
FIG.3: L as a function of n for V 0 data set.Three events from different sources are picked randomly as examples.L is the average path length, n is the number of isolation trees.

3 FIG. 4 :FIG. 5 :
FIG.4: Normalized distributions of a, the left panel is for V 0 and the right panel is for V 3 .

7 FIG. 6 :
FIG.6: Ŝstat = σ s / √ σ bg + σ s as functions of a min , where σ s,bg are the cross-sections of aQGC contribution and backgrounds after a cut on anomaly score a > a min .

4 FIG. 9 :
FIG. 9:The distributions of a 0 − a 50 for the backgrounds, where a 0 − a 50 are the changes of the anomaly scores from a data set without signal events to a data set with 50 signal events.

TABLE II :
The cross sections after N ℓ,j cut.
The cross sections of the signals and backgrounds after this cut are listed in Table II.For illustration, we concentrate on V 0,3 vertices which originate from O M i and O T i operators, respectively.Denoting N SM,t t,aQGC as event numbers of the SM background, t t background and the signal, we generate the events in the ratio N SM : N t t : N aQGC = σ SM : σ t t : σ aQGC , where σ SM,t t,aQGC are cross sections of the SM background, t t background and the signal, respectively.For the signals, we keep N aQGC = 50