Exploring SMEFT in VH with Machine Learning

In this paper we study the use of Machine Learning techniques to exploit kinematic information in VH, the production of a Higgs in association with a massive vector boson. We parametrise the effect of new physics in terms of the SMEFT framework. We find that the use of a shallow neural network allows us to dramatically increase the sensitivity to deviations in VH respect to previous estimates. We also discuss the relation between the usual measures of performance in Machine Learning, such as AUC or accuracy, with the more adept measure of Asimov significance. This relation is particularly relevant when parametrising systematic uncertainties. Our results show the potential of incorporating Machine Learning techniques to the SMEFT studies using the current datasets.


I. INTRODUCTION
The Particle Physics community holds high hopes of discoveries in the Large Hadron Collider (LHC), the machine colliding protons at the highest energies in an Earth laboratory.Yet, after years of an intense effort searching for new phenomena, no clear evidence of new physics has been found.
To continue the search for new phenomena and improve the exploitation of the LHC data, we are shifting our focus from the low-hanging fruit, e.g.resonance searches, into more subtle (indirect) effects of new physics.A well-defined approach to develop the interpretation of data in terms of indirect probes is the framework of Effective Field Theories [1], and in particular in the context of the Standard Model EFT (SMEFT) [2].
In a nutshell, the SMEFT is a consistent way of exploring new theories as deformations from the SM structures, with a large number of possible SM deviations taken into account.
As an example, in the SMEFT approach the Higgs couplings to vector bosons V = W, Z would be modified in the following way which in terms of Lagrangian terms would be equivalent to adding to the SM Lagrangian new terms suppressed by a scale of new physics where H is the Higgs SU (2) doublet and W k is the electroweak gauge boson triplet.One could trace the ultimate origin of these deformations to many different types of new physics, just too heavy to be discovered directly at the LHC.For example, the deformation (aka Wilson coefficient) c HW could be the manifestation of a new set of scalar particles, such as in 2HDMs, too heavy or too complex to be seen in 1.An example of how operators like cHW could arise from new theories.H1,2 denotes the light (heavy) Higgs.
direct production, but still felt via virtual effects such as the one-loop contribution shown in Fig. 1 [3].
These new theories would then manifest themselves in the LHC environment as subtle deviations in physical observables, often in kinematic regions where the theoretical and experimental understanding is particularly poor.In contrast with a resonance search in a final state, SMEFT analyses forces us to deal with the LHC's inherently complex environment, where the understanding of extreme kinematic regions is required.
In the context of the SMEFT effects in the Higgs sector, the LHC analyses have moved from the basic use of total cross-sections (κ formalism [4]) to understand that pushing the boundaries of the SMEFT means using kinematic information [5].Even that frontier is becoming a well-trodden path with the Run2 finished, and searches for new physics in SMEFT effects now moving towards identifying even subtler effects by looking at multidimensional information [6] and combining as many channels as possible [7].
This state of affairs, the need to quickly identify subtle effects in multidimensional distributions of information, clearly calls for artificial intelligence methods.Particularly the use of data mining techniques in Machine Learning [8].The amount of information one single channel can provide is limited, though.Even in a complex final state such as Vector Boson Fusion (VBF) and all the multidimensional correlations one can think of in this arXiv:1902.05803v1[hep-ph] 15 Feb 2019 channel, the amount of information quickly saturates [6], just a manifestation that the kinematics of the final state particles (input information) satisfies a number of constraints (energy-momentum conservation, behaviour of parton distribution functions, experimental selection cuts and resolution), limiting the usefulness of single channels.In VBF, we showed the inherent limitations in a Bayesian context [6] and recently in Refs.[9] the authors pioneered the use of Machine Learning to identify SMEFT effects in VBF, including data augmentation.
Despite its importance to understand the electroweak sector, the measurement of Higgs production in VBF is not a reality yet, hence studies are based on future prospects.On the other hand, the production of the Higgs in association with a massive vector boson, or VH, is already firmly stablished [10,11].As quality kinematic information in W H and ZH, better statistics and experimental understanding, will occur before VBF production is understood, we believe the approach of this paper would be the first step to push the boundaries of our understanding of SMEFT effects on Higgs LHC data, complemented later on with VBF information.
In this paper, we will illustrate the use of Machine Learning techniques in VH by switching on a single SMEFT effect on W H and ZH.In the past few months, we have witnessed an explosion works by the HEP community on the use of Machine Learning techniques, e.g.Refs.[12], and the analyses have quickly become more and more sophisticated.Althoug in this paper we use state-of-the-art techniques, we expect our results in VH will be surpassed by other works in the near future.
This paper is organised as follows.In Sec.II, we describe the current status of the SMEFT analyses and the experimental understanding of the VH channel.In Sec.III, we then move to describe the sort of kinematic information one could use in VH.The Machine Learning analysis, in particular the use of a shallow neural network is described in Sec.IV and Appendix A, where we provide a simple glossary of terms used in this paper.We present our results in Sec.V, and discuss possible new directions in Sec.VI.

II. CURRENT STATUS: LIMITS ON THE SMEFT, AND THE VH AT THE LHC
We are going to illustrate the techniques using a particular deformation, the operator in Eq. 2 with Wilson coefficient d HW .It is currently constrained to values in the range [7] (individual constraint) c HW = 0.002 ± 0.014 . ( In this paper we will often illustrate points using a benchmark within the 2σ region: The limits on SMEFT operators were obtained by perfoming a global fit including kinematic information on VH [13] and electroweak W W production at LEP2 and LHC [14] but only 40 fb −1 of data, half of the total Run2 dataset.A more recent global analysis was done by the groups in Refs.[15,16], but their analysis did not substantially change the limit on c HW .On the other hand, sensitivity studies of future colliders such as HL-LHC show that these limits will be pushed to a few times smaller than the current limit [17].On the experimental side, the ATLAS [10] and CMS [11] collaborations have marked yet another milestone in their quest to understand electroweak symmetry breaking: the observation of the Higgs decaying into two b-quarks.This measurement has been done by combining a challenging set of channels collectively denoted by VH, which corresponds to the Higgs produced in association with a massive vector boson V = Z or W ± .The final states are classified as 0L (Z → ν ν), 1L (W → ν) and 2L (Z → + − ).The combination of all the channels can be summarised as the ratio of the observed cross-section by the SM expectation, µ V H .For example, the ATLAS measurement reads µ V H (ATLAS) = 1.01 + 0.12 (stat.)

III. KINEMATIC INFORMATION IN VH
Right after the discovery of the Higgs boson in the summer of 2012, the VH channel was identified as an important source of information to search for anomalous behaviour of the Higgs.In particular, the distribution of transverse momentum of the vector boson, p V T was identified as very sensitive to new physics, even to the point of reviving TeVatron searches which had failed to unveil the Higgs boson [5].
When Run1 LHC data started to place limits on the VH channel, the non-observation of deviations in the high-p T regions was also used to inform our global understanding of SMEFT theories [13,14] and the CP properties of the Higgs [18].Moreover, the understanding and classification of the p V T distributions was crucial for Run2 collaborations to achieve a measurement [10,11].
As mentioned in the Introduction, the VH channel would seem qualitatively less interesting than the VBF channel, where the forward jets enrich the overall kinematic information.Nonetheless, VH with its slightly different three channels also offers interesting kinematic information, see in Fig. 2 a few examples of distributions we will use later in our analysis.
Moreover, progress is made at stages and the experimental understanding of the VBF channel is nowhere close to VH.In VH, huge SM reducible backgrounds, such as a Z and heavy flavour production, had been studied and kept under control thanks to the tremendous ingenuity of the experimental collaborations.The recent observation of the Higgs sets then a new stage for the VH channel, where new physics can be searched and tensioned against SM-Higgs production.

IV. USING A SHALLOW NEURAL NETWORK
In this section we will describe the methodology we developed to study the SMEFT in VH using Machine Learning, in particular a shallow neural network.To help the novice reader, in Appendix A we have collected a glossary of terms alongside brief explanation of their meaning.
To extract the maximum amount of information from the kinematic features, one needs to combine multidimensional information such as shown in Fig. 2 in 0D, 1D, 2D and even higher dimensionalities.The objective is to maximise our ability to detect new phenomena, which in HEP means maximising the significance of an observation.Given a number of signal events s, where signal here represents the SM plus a deviation like c HW , and a number of background events b, one can use the Asimov estimate of significance [19] as a measure to maximise, see Appendix A. Our problem then consists on building a function, inverse of the Asimov significance, and find its true minimum inside a complex parameter space by including information from a diverse set of observables.
Similarly to the procedure described in Ref. [20], we use a shallow neural network (NN) built from one hidden layer with number of neurons equal to the number of kinematic observables we consider.To set the best hyperparameters, instead of performing a brute force grid search as in [20], we are making use of Evolutionary Algorithms in Python (DEAP) [21], in addition to the the Scikit-Learn library.As activation function, we found a rectifier function (max(x, 0)) to perform better than the typical sigmoid and other logistic regression options.For optimisation, we found Adam was best performing.Other minor adjustments were done to the batch size and the dropout options, see Appendix A.
We then fed the algorithm with a large number of simulated events, both signal and background.The events had a number of characteristics, including p T of the objects (b-jets, leptons, missing energy) and combinations of different objects.The identity of the event (signal or background) was used by the algorithm as part of the training, as we are dealing with a supervised machine learning problem.Before tackling minimisation of the Asimov loss function, we performed a pre-training set of runs for 5 epochs, along the lines suggested in Ref. [20] using a steeper loss function.A longer run, with about 20-30 epochs was then done.
The outcome of these runs was the ability to classify events as signal or background, and to assign a Asimov significance estimate to a particular choice of c HW coefficient (the strength of the deviation) and luminosity (the amount of available data).
In Fig. 3 we show the effect of pretraining in separating signal over background.The plots show the distribution of the signal benchmark-point with c HW = 0.03 (red) and background (blue) events as a function of the classifier output.The left plot is the outcome of performing an initial pretraining run with 5 epochs.The middle and right plots shows the final distribution after a longer run was performed.In all the plots, the solid distribution correspond to the outcomes on the training sample (70% of the sample), whereas the dots correspond to the test sample.The fact that the solid distribution (train) and dots (test) distributions are similar is an indication that the algorithm is not overfitting.The middle plot compares BSM with SM Higgs production.The right plot is the separation between BSM and a reducible background, Z+HF, where a cut on the m bb variable in the Higgs mass window was done.By comparing the middle and right plot, one sees that the reducible background is easier to remove than the genuine SM Higgs background, as expected.

V. RESULTS
The goodness of our procedure can be first evaluated by looking at the ROC curve in Fig. 4, where we show the signal efficiency and background rejection curves.We present two examples of SMEFT effects, our benchmark value c HW = 0.03 and a very small value 0.001 which approaches the SM case.As expected, larger values of c HW present a better AUC and higher significance.
Perhaps a more intuitive way to understand this ROC curve is to compute the predicted identity of events.In the right panel of Fig. 4, we show a kinematic distribution of true signal events, separated by their predicted identity.Unsurprisingly, events with high energy are easier to distinguish from the SM backgrounds.This can be traced back to the Feynman rule in Eq. 1, where the SEMFT effects are momentum dependent and tend to lead to higher kinematic reach than SM interactions.
Nevertheless, quantities like the ROC curve and its AUC do not provide the answers we need in Particle Physics.We are interested in understanding beyond acceptance and rejection, but also the dependence with increasing luminosity and the effect of systematic uncertainties which are often disregarded in machine learning studies.In the right panel of Fig. 5 we show the Asimov significance in the 0L channel, for a choice of systematic uncertainty at 50%, as a function of luminosity for various choices of the SMEFT coefficient.The bands correspond to 2σ ranges.In contrast with the results from global fits, we obtain that values much below the 0.03 benchmark may be excluded by the Run2 data.The extent of this current exclusion cannot be obtained in a reliable fashion from our analysis, as we did perform an simplistic leading-order parton+shower analysis.Nevertheless, one would infer that sensitivity to value of c HW around 0.001 could be obtained using the CMS and AT-LAS combined Run2 data.
As one can see from the left panel of Fig. 4, c HW = 0.001 seems a limiting case for our algorithm in the 0L channel, as the AUC is very close to 0.5.We have chosen the 0L channel as it generally provides the best sensitivity to SMEFT [13], but one would wonder whether one could improve the sensitivity to this difficult point by combining with the other two channels (1L and 2L).The right panel in Fig. 5 shows the increase of sensitivity due to combination.For the small SMEFT deviation c HW =0.001 and 50% systematics, the improvement is within the error bars of the Asimov significance.A better handle on systematics could make the combination much more effective.

VI. OUTLOOK
With the increasing experimental understanding of the LHC data, new ways to search for new physics open up.In particular, the use of detailed kinematic information is the next frontier in terms of LHC data characterisation.More capabilities come with more ambitions, particularly in terms of the complexity of new phenomena one can hope to tackle.We have identified one channel (VH) which is both relatively well understood and broad in terms of its kinematic reach, and a set of Machine Learning techniques which could allow us to detect new physics in the behaviour of the Higgs boson.We chose the SMEFT as a template of the kind of deviations one could expect in the Higgs via virtual effects of new particles.
Within the framework of our analysis, we found the 0L channel to be dominant, which was expected.We obtained a limit in the SMEFT coefficient c HW of 0.001, about 30 times better than the current constraint from a global analysis [7] with the Run2 data.This result shows the potential of incorporating these techniques to the SMEFT studies.
Our analysis could be improved in a number of ways.First and foremost, a more realistic simulation could be performed by the experiments, including NLO SMEF effects [22].Secondly, although we found that deep layers led to overfitting, and a shallow NN was more suitable, new algorithms could be explored to increase sensitivity.
In particular, one could use outlier detection without supervision.Thirdly, we should understand the effect of switching on more than one deviation along the lines described in Ref. [9].This should be the stepping stone to a more global use of Machine Learning techniques in the area of global fits to SMEFT properties.
-/ E T missing transverse energy

FIG. 2 .
FIG. 2. Few illustrative 1D and 2D feature plots for inclusive 0L, 1L and 2L SM-Higgs production.Red dots correspond to background (SM Higgs production) and green dots to signal (Higgs SMEFT).Note the broader kinematic reach of the signal.

FIG. 3 .
FIG.3.Distribution of 0L signal (red) and background (blue) events as a function of the classifier output.The left plot is the outcome of performing an initial pretraining run with 5 epochs.The middle (SM Higgs) and right (Z+HF) plots show the final distribution after a more precise, longer run is done.The solid distribution correspond to the outcomes on the training sample (70% of the sample), whereas the dots correspond to the test sample.

Background rejection 1 FIG. 4 .FIG. 5 .
FIG. 4. Left: ROC curve for two values of the SMEFT coefficient in the 0L channel.Right: Classification of true signal events for cHW =0.03 and their mapping to kinematic features.

-••
∆R wl separation between lepton and W boson in the η − φ plane -∆φ b1l azimuthal angular separation between leading b-jet and lepton -∆φ l / E T azimuthal angular separation between lepton and / E T 1L channel : -/ E T missing transverse energy -∆φ b1 / E T azimuthal angular separation between leading b-jet and / E T 2L channel : -p l1 T transverse momentum of the leading lepton p l2 T transverse momentum of sub-leading lepton -∆R ll separation between two lepton in the η − φ plane -∆φ b1l1 azimuthal angular separation between leading b-jet and leading lepton -∆φ b2l1 azimuthal angular separation between sub-leading b-jet and leading lepton GeV, |η l | < 2.7 / ET > 30 GeV, p V T > 150 GeV 2L p l T > 7 GeV, |η l | < 2.7, p V T > 75 GeV Leading lepton pT > 27 GeV 0L,1L,2L p b T > 20 GeV, |η b | < 2.5, Leading b-jet pT > 45 GeV TABLE I. Cuts applied at event generation level for both signal and background process.In case of Z+HF we apply an additional cut on m b b i.e. 115 < m b b < 135 GeV.