Polarization fraction measurement in same-sign WW scattering using deep learning

Studying the longitudinally polarized fraction of $W^\pm W^\pm$ scattering at the LHC is crucial to examine the unitarization mechanism of the vector boson scattering amplitude through Higgs and possible new physics. We apply here for the first time a Deep Neural Network classification to extract the longitudinal fraction. Based on fast simulation implemented with the Delphes framework, significant improvement from a deep neural network is found to be achievable and robust over all dijet mass region. A conservative estimation shows that a high significance of four standard deviations can be reached with the High-Luminosity LHC designed luminosity of 3000 $fb^{-1}$

Studying the longitudinally polarized fraction of W ± W ± scattering at the LHC is crucial to examine the unitarization mechanism of the vector boson scattering amplitude through Higgs and possible new physics. We apply here for the first time a Deep Neural Network classification to extract the longitudinal fraction. Based on fast simulation implemented with the Delphes framework, significant improvement from a deep neural network is found to be achievable and robust over all dijet mass region. A conservative estimation shows that a high significance of four standard deviations can be reached with the High-Luminosity LHC designed luminosity of 3000 f b −1 PACS numbers: 12.38.Cy, 12.38.-t, 13.85.Qk, 14.80.Bn The High-Luminosity LHC (HL-LHC) will measure for the first time many novel processes predicted by standard model (SM), and study precisely especially those involving pure electroweak interactions such as vector boson scattering (VBS). VBS is sensitive to non-Abelian weak gauge boson interactions, and to the structure of electroweak symmetry breaking. Typical VBS signatures at hadron colliders include, for example, large dijet mass (m jj ) and large pseudorapidity separation (∆η jj ).
Among various VBS processes, same charge W ± W ± production is one of the most promising channels for the above mentioned purpose. The VBS W ± W ± process profits from low background, due to the signature of two same sign charged leptons. Same charge W ± W ± scattering has been observed by CMS and ATLAS with a significance larger than 5 standard deviations, based on data collected at √ s = 13 TeV, corresponding to an integrated luminosity of approximately 35.9f b −1 [1] [2]. The dominant backgrounds after the VBS selection arise from WZ production with one lepton misidentified, and non-prompt leptons from hadron decays, which can be further suppressed by requiring m jj to be above 1 TeV.
The next important goal after the discovery of VBS W ± W ± is to measure the fraction of longitudinally polarized (LL) events. The LL component contributes only to a level of 5-10% in W ± W ± → W ± W ± , but it is extremely interesting as a direct probe of the unitarization mechanism [3] of the vector boson scattering amplitude through Higgs and possible new physics [4] [5].
There have been extensive studies on LL fraction measurement, exploiting various kinematic observables. Popular variables include leading lepton transverse momentum (p l1 T ), and the azimuthal angle difference between the two leading jets (∆φ jj ). On top of these, ref. [6] proposed to use the variable R pT = Ref. [7] examined matrix element method to differentiate different beyond SM model scenarios. More recently, ref. [8] applied a regression with Deep Neural Network (DNN) to recover the lepton angular distributions in the W boson rest frame, and shows that the expected accuracy can be improved by about a factor of two compared to the use of R pT .
In the meantime, CMS studied the prospects for a measurement of the LL fraction, based on full simulation samples with the upgraded CMS detector at the 14 TeV HL-LHC [9,10]. The expected significance for an integrated luminosity of 3000 f b −1 is estimated to be 2.7 standard deviations. The study is based on a fit to ∆φ jj distributions in two m jj bins.
In this study, we examine the impact of using a DNN on LL fraction measurement. In contrast to what has been done in ref. [8], we exploit here DNN classification instead of regression, based on the framework of the Keras library [11] with Tensorflow back-end [12]. We perform a fit on the resulting DNN discriminant.
There have been more and more applications of machine learning techniques in high energy physics, with some first examples in Refs [13,14]. Detailed studies are provided in this paper based on either low-level or high-level features. A comparison with boosted-decision trees [15] implemented in TMVA [16] are also provided.
Simulation samples are generated with Mad-Graph5 aMC@NLO [17] interfaced with Pythia 6 [18] for parton showering and hadronization and Delphes version 3 [19] for detector simulation with CMS configuration. Similarly as in ref. [8], we neglect the 'pileup' effects due to overlapping interactions in proton proton collision, as they can be mitigated effectively with advanced experimental techniques. The inclusive W ± W ± VBS samples are decomposed into LL, TT (transversely polarized W ± W ± ) and TL (transversly and longitudinally polarized W ± W ± ) components, with the help of DECAY package provided by MadGraph. We require exactly 2 same-sign charged leptons with p T > 20 GeV and |η| < 2.4, and select the two leading jets with p T > 50 GeV and |η| < 4.7 as VBS jet candidates. We further require |∆η jj | > 2.5, a b jet veto, and performed our studies in several benchmark selections : m jj > 850, 1200, 1500, 1800, and 2000 GeV. The backgrounds from WZ and non-prompt leptons can indeed be suppressed effectively with higher m jj .
As inputs to the DNN, low-level features include the p T , η, φ of the two leptons, p T , η, φ and mass of the two jets, and x-and y-components of missing transverse energy ( / E T ). We further include high-level features with zeppenfeld variable [20] of the two leptons, ∆φ jj , ∆η jj and ∆R ll,jj . Four million events have been produced for training, validation and testing. Overtraining has been carefully checked by monitoring loss value dependency on DNN training epoch, in both the training and validation dataset. Early stopping has been applied if there is no improvement in loss value comparing with any latest 20 epoch's loss value. Overtraining can also be precisely checked by comparing output distribution of training and test dataset.
Two differently structured DNN models, a 'dense' and a 'particle-based' model, have been trained and tested. We selected a 10-layers dense neural network with 150 hidden units on each layer with the 'relu' activation function, the 'sigmoid' function applied on final nodes, taking 'adam' optimizer with a learning rate of 0.001, and 0.01 as regularization term for L2 regularization. Moreover, a batch size of 50 events, a 50% drop-out rate on hidden unit, and batch normalization are applied to avoid overtraining. As an alternative, to efficiently model highly correlated variables of each particle, we also tried a particle-based model which involves separate grouping of nodes for the features of each particle and a gradual merging of all nodes into bigger layers. Fig. 1 shows a simplified version of the particle-based model. The model actually used contains 2 hidden layers with 20 nodes for each particle. Leptons and jets are merged with 2 layers of 40 nodes before they are merged with the / E T features. Finally, 4 layers of 180 nodes are added. Fig. 2 shows the receiver operating characteristics (ROC) curve, which has been widely used as a measure of performance. From Fig. 2, improved performance can be found for the particle-based model compared to the dense model. Studies using a DNN model including low and high-level features have been performed, but no significant improvement was found.
Similar studies have been performed using a BDT, with 1000 trees of 5 maximum depth, and 'Adaptive Boost' algorithm. Fig. 2 shows the performance of each discriminant variable. Calculated area under curve (AUC) is 0.788, 0.762, 0.776, 0.666, and 0.591 for the particlebased DNN, dense DNN, BDT, p l1 T , and ∆φ jj , respectively. The DNN particle-based model has slightly more discriminative power comparing to BDT, and is much more powerful than the single variables, p l1 T and ∆φ jj . Fig. 3 shows several kinematic distributions of DNN inputs, and the distribution for the DNN discriminant itself. One can clearly see that the DNN greatly improves signal-background discrimination compared to rectangular cuts.
We perform a fit to the DNN output and extract the LL fraction. The estimated LL fraction and accuracy is calculated by applying 2% luminosity uncertainty, and 5% systematic uncertainty both on LL and TT+TL.   shows the DNN results compared with methods based on p l1 T and ∆φ jj . Examples of fit results can be seen in Fig. 4, for m jj > 1500 GeV, which are achieved by HistFactory [21] and cross-checked with RooFit [22].  Finally, we report here the significance. As mentioned above, the VBS W ± W ± process profits from lower background than in other VBS channels, considering that dominant backgrounds (WZ and hadron decays) are greatly suppressed and asymptotically negligible at high m jj [1,2,10]. On the other hand, contributions from those dominant backgrounds can be estimated in experimental analysis and thus subtracted keeping uncertainties under control. In the ranges m jj > 1500 and 2000 GeV, significances of 5.2 and 4.1 standard deviations can be achieved from a likelihood fit of DNN distributions. The same study has been performed via p l1 T and ∆φ jj . Fig. 5 shows greatly improved significance obtained with DNN. In summary, measuring the longitudinally polarized fraction of W ± W ± scattering at the LHC is crucial to examine the unitarization mechanism of the vector boson scattering amplitude through Higgs and possible new physics. We apply here for the first time a Deep Neural Network classification to extract the longitudinal fraction. Based on fast simulation implemented with the Delphes framework, significant improvement from DNN is found to be achievable and robust. An observation with an integrated luminosity of 3000 f b −1 is found to reach 4 standard deviations at high m jj region, such as above 2 TeV, where backgrounds are negligible. With a combination of the CMS and ATLAS measurements at the HL-LHC, an observation above 5 standard deviations can be expected with the Deep Learning technique proposed in this study.