A rotation-equivariant graph neural network for learning hadronic SMEFT effects

We introduce a graph neural network architecture designed to extract novel phenomena in the Standard Model Effective Field Theory (SMEFT) context from LHC collision data. The proposed infrared- and collinear-safe architecture is sensitive to the angular orientation of radiation patterns in jets from hadronic decays of highly energetic massive particles. Equivariance with respect to rotations around the jet axis allows for extracting the information on the angular orientation decoupled from the jet substructure. We demonstrate the robustness of the approach and its potential for future probes of the SMEFT at the LHC through toy studies and with realistic event simulations of the WZ process in the semileptonic decay channel.


I. INTRODUCTION
The Large Hadron Collider (LHC) is a veritable gold mine of data, among whose most complex signatures are collimated sprays of particles (jets) originating from the hadronic decays of boosted massive particles.This richness presents a challenge and a fertile ground for exploring fundamental physics in hadronic final states.
In recent years, machine learning algorithms have been increasingly adapted to more complex data representations, with the ensuing rise of the input feature dimension tamed by imposed symmetries of the underlying physical problem (see Refs. [1][2][3] for an overview).In particular, message-passing graph neural networks (gNNs) [4][5][6] learn from relationships among particles that are interpreted as point clouds with manifest permutation invariance [7][8][9][10].Such developments have markedly improved the tagging performance for the various objects that are reconstructed as highly energetic jets at collider experiments [11].Incorporating invariance or equivariance for symmetry groups specific to certain tagging challenges refines the algorithms' inductive bias and is an active area of research [12][13][14][15][16][17][18][19][20][21][22][23].In particular, infrared and collinear (IRC) safety was integrated into gNNs by means of energy-weighted message-passing [24], generalizing Energy Flow Networks [25], and providing robustness to uncertainties in the modeling of the splitting or merging of soft and collinear particles.A rich technological toolkit thus extends the experiments' grasp on lowerlevel data representations that are particularly relevant for the hadronic final states considered in this work.
In the theoretical domain, a parallel interest in more low-level representations of hypothetical new phenom-ena is fueled by the absence of any compelling signal of new resonant phenomena.The Standard Model Effective Field Theory (SMEFT) [26][27][28] has emerged as the preferred language for describing hypothetical phenomena below an assumed energy scale, that separates the energy scale of the LHC from resonant phenomena at much higher scales.The SMEFT is a powerful framework extending the standard model (SM) Lagrangian by field monomials whose prefactors, the Wilson coefficients, are the parameters of interest (POIs) in experimental measurements.These give the experimentalist a low-level representation of a wide range of hypothetical high-scale models.
The linear term in the polynomial describes the SM-SMEFT interference.It is the only contribution not subject to further contributions from SMEFT operators with a higher mass dimension, and, therefore, the unambiguous harbinger of dimension-6 SMEFT effects.In a wide range of final states involving decays of massive vector bosons, however, a naive experimental analysis may unintentionally remove the sensitivity to the linear contribution.In these cases, a dedicated angular analysis can sometimes "resurrect" the interference terms [40][41][42][43][44].In such cases, the orientation of the decay planes of the W or Z boson provides crucial sensitivity because it can resolve the amplitudes' helicity configuration which is altered in the SMEFT [40].Analyses of leptonic final states of diboson processes (VV with V=W, Z, γ, or H) already profit from the boost in experimental sensitivity [45][46][47] and served as a motivating use-case for the development of machine-learned optimal observables [36][37][38][39].
The main idea of this paper is to extend this FIG. 1. Sketch of the pp → WZ process, the decay plane, the decay plane angle φ decay of the W boson decay.The ∆Ri and φi coordinates of a constituent i from a highly energetic jet are defined with respect to the beam plane and the jet axis.
simulation-based inference approach to hadronic final states and extract SMEFT sensitivity via a gNN that is equivariant with respect to azimuthal rotations around the axis of a highly energetic jet, originating from the hadronic decay of a boosted massive particle.The variable-length set of the jet's constituents is the input to the gNN.Its output is fed into dense layers that can also accept other features of the event.The algorithm is tailored towards the linear SM-SMEFT interference and exploits the spatial angular orientation of the radiation patterns with respect to the event remainder.A prototypical situation is sketched in Fig. 1.The rest of the paper is organized as follows.In Sec.II we describe the subject of our study, the semileptonic WZ process, and the data sets' structure.In Sec.III, we formulate the algorithm.We elucidate its main features in simple toy studies presented in Sec.IV.The application to simulated WZ events is provided in Sec.V, and we give conclusions and an outlook in Sec.VI.

II. THE SEMILEPTONIC WZ FINAL STATE
Despite the relatively small number of expected events in the high-p T regime, we can nevertheless probe SMEFT operators as they induce energy-growing corrections to the SM amplitudes [38,39].Among the diboson final states providing SMEFT sensitivity to LHC data analyses [45][46][47], we chose the comparably simple case of pp → WZ as a motivating example.We consider the semileptonic decay channel, where Z → lℓ and W → qq, and restrict to the highly energetic phase space p T (W) > 300 GeV, where the W boson is reconstructed as a jet by the anti-k T [48] algorithm in the FASTJET implementation [49] with a radius parameter of R = 0.8 (AK8) of approximately massless jet-constituents.We focus on the interference contribution to the differential cross-section from the operators whose modifies the light quark-gauge boson vector-like interaction.At the linear level, these operators exhibit energy growth with the center-of-mass energy of the diboson system, denoted by s.The operator O (3) Hq dominantly contributes to the helicity configuration where both bosons are longitudinally polarized and where SM contribution is constant in s at high energy.Therefore, the SM-SMEFT interference term introduces energy growth but does not modify the distributions of the azimuthal decay-plane angles.Therefore, we do not consider it further.
The operators O W and O W , in contrast, contribute energy-growing amplitudes for transverse gauge boson polarisation with the same helicities which are small in the SM.Consequently, the interference contribution vanishes in the tail if it is not resurrected by a dedicated angular analysis [38][39][40].In the leptonic decay channels, the azimuthal orientation of the decay plane φ decay , as shown in Fig. 1, drives the sensitivity.A dedicated multivariate analysis of the fully leptonic decay mode exploits this fact by extracting SMEFT sensitivity from global event kinematics, i.e., per-event features that are reconstructed from lepton kinematics and in particular angular observables in the diboson rest frame [38,39].
In our case, we attempt to similarly access this sensitivity by extracting it with a graph neural network.Because of the special role of the azimuthal decay plane angle, we construct it equivariantly with respect to rotations of the boosted jet's constituents around the jet axis.The relevant symmetry group is SO(2), irrespective of whether we choose our reference frame as the lab frame or, as done in Fig. 1, as the rest frame of the WZ system.Our events are instances in a data set Nevents j=1 (2) where x global denotes optional global event features, pertaining to the kinematics of the WZ candidate event, the Z boson, the transverse missing energy, etc.In each event, there is also a list of N p approximately massless constituent particles of an AK8 jet [48].These constituents could, for example, be provided by a particleflow algorithm [50].Each particle's feature vector x p contains the four-momentum p µ i and, in principle, a number of features which could, e.g., represent the quality of the particle tracks association with the primary collision vertex, its probability of originating from a pileup vertex [51], or the charge of the particle.We emphasize that the data set in Eq. 2 is hierarchical in the sense that the reference frame for the p µ i depends on the event kinematic that also enters via x global .There are two natural choices for these reference frames.In the lab-frame, the constituents' four-momentum is given by the transverse momentum p T,i of the particle and the azimuthal angular difference to the jet axis ∆ϕ i , as well as the difference in pseudo-rapidity ∆η i .Alternatively, we can obtain a polar and an azimuthal angle after a Lorentz boost into the diboson rest frame.In this case, we can align the z-axis of the spherical coordinate system with the jet axis and measure the azimuthal angle, denoted by φ i , with respect to the beam plane as depicted in Fig. 1.In both cases, we denote Euclidean distances in the two-dimensional angular coordinates by ∆R.Either way, the data set encodes information on the per-event decay plane that is disguised in the radiation patterns mapped to the variable-length constituent vector.The algorithm described in the next section is designed to extract its SMEFT sensitivity.

III. THE ALGORITHM
The constituents {x p } Np i=1 of the highly energetic jet are individually reconstructed particles that can be viewed as a point cloud whose elements are feature vectors, composed of observables like four-momenta, charge, and other constituent properties.
General message-passing gNNs construct graphs on the point cloud and iteratively change the representations of the graphs' nodes by updating the feature vectors h i by an aggregate of messages from the nodes in a neighborhood N (i) and, possibly, from the edges connecting the nodes [4][5][6].In the most general case, a node update function ψ (l) and a message passing function ϕ (l) are highly expressive learnable functions, typically implemented as multilayer perceptrons (MLP).At each iteration l + 1, these determine the feature vector of a node i from the messages i m (l) that is obtained from the node features h (l) i , the neighbors' node features h (l) j , and, possibly, from edge features e (l) ij via the general node-update formula h . The aggregation function, denoted by □, is permutation invariant and accumulates the messages from a suitably defined neighbourhood N (i) of particle i.After a number of L iterations, the nodes are read out by accumulating the features h (L) with an aggregation function that typically spans all nodes in the graph.In the simplest setting, this output is fed into a final global read-out MLP whose parameters, together with the parameters of the message-passing and per-iteration read-out MLPs, f First, we simplify the problem by demanding IRC safety via energy-weighted message passing, following Ref.[24].This requirement imposes a number of restrictions.The input features for iteration l = 0 are only the four-vectors p i of the particles.The message passing function, specifically, is given by i m m (p i , pj ) where p denotes the direction of the three-momentum of the particles.The energy weighting FIG. 2. Sketch of the network configuration as explained in the text.The equivariant feature is represented by a double line and the algorithms output by f . is implemented via the relative hardness where instead of the transverse particle momentum p T , we could alternatively use another measure of the hardness of the particle, e.g., the energy or the threemomentum measured in a well-defined rest-frame.Furthermore, the definition of the neighborhood cannot depend on the occurrence of infrared or collinear splits of the particles, excluding, e.g., the otherwise common kNN algorithm.Instead, a particle j is defined to be in the neighborhood N (i) of a particle i according to the Euclidean distance in the (pseudo-)rapidity-azimuth plane, ∆R ij = ∆ϕ(p i , pj ) 2 + ∆η(p i , pj ) 2 ≤ ∆R.The threshold ∆R is a hyperparameter of the network.Choosing a specific per-event reference frame for the N p massless constituents of a given W jet candidate, the gNN inputs for l = 0 are represented by a feature list {p T,i , φ i , ∆R i } Np i=1 as shown in Fig. 1.Our algorithm is also SO(2) equivariant.In general, a function γ : X → Y is equivariant with respect to group transformations g of a group G, if there are two representations T g and S g , acting on the spaces X and Y , respectively, and the function satisfies S g (γ(x)) = γ(T g (x)).Because SO(2) is a one-dimensional group, we single out one angular coordinate, denoted by h φ and demand that the new representation (h φ , h) ∈ [−π, π]×R M transforms equivariantly under the SO(2) group action as This group action applies a common shift ∆φ to the azimuthal coordinates h φ and to the azimuthal input features φ i .The M remaining features h of this data representation transform invariantly.h will encode information related to the substructure of the jet, that are also invariant with respect to these rotations, while h φ can only contain information associated to quantities that transform equivariantly with the angle, effectively decoupling the two aspects.
Because the group manifold of SO( 2) is a circle, we represent this coordinate in the complex plane as e ihφ,i such that the message-passing relations are manifestly equivariant under the transformation (4) if we restrict the inputs to the message-passing neural networks, f (l) ϕ and f (l) h , to group invariants.The distance function for N (i) for l > 1 can be generalized to ).Besides the features h, invariant under S by construction, the only other invariants are differences of φ coordinates, such that Eq. 5 is general.Other dependencies are not allowed when IRC safety should be ensured, such that x p,i reduces to p µ i , and concretely for the inputs to the first iteration, where ∆R i is measured with respect to the jet axis.Because the inputs to the neural networks f ϕ and f h are the same, weights can be shared.In practice, a single neural network with M (l) + 1 outputs, simultaneously providing f ϕ and f h , has proven efficient.After L iterations, the global pooling is carried out with the energy-weighting in Eq. 3, but summing over all the constituents, i.e., N = {x p } Np i=1 [24].Together with the global features x global , the pooled IRC-safe and SO(2) equivariant outputs are fed into a final MLP.A sketch of the whole construction is provided in Fig. 2.
Technically, the algorithm is implemented in PyTorch [52] using the PyTorch-Geometric (PyG) package [53].Best performances were obtained with leaky ReLU activation functions with a slope parameter of 0.3.For training, we used the Adam PyTorch optimizer with a learning rate of 10 −3 .Other choices for the activation functions (e.g., ReLU or sigmoid) and for the optimizer led to comparable performances, albeit with a slightly longer training time.This suggests that the algorithm performance is relatively robust.For the studies in the remainder of the paper, we found that a single iteration (L = 1) with a message-passing neural network with two hidden layers of node size 20 and a read-out MLP with the same configuration performs sufficiently well.This is, therefore, our baseline configuration.Higher numbers of iterations train slower, with similar performance for the cases discussed below.
Finally, we note the possibility of breaking IRC safety by including extra information via the input features and still keeping the energy-weighting in Eq. 3.For example, the constituents charge could help resolve the partonlevel ambiguity of the light quarks from the W decay.While jet charge measurements were done by the ATLAS and CMS collaborations [54,55], the benefit of non-IRCsafe information, including vertex-association quality or the probability of originating from pileup vertices, cannot be confidently assessed without access to a comprehensive model of systematic uncertainties.

IV. TWO TOY SCENARIOS
We illustrate the algorithms' basic properties using a toy data set that aims to reflect a simple two-prong structure in a variable-length feature vector.We first let Npart = 50 be the mean number of constituents in a mock-up jet.For each training instance, we sample two Poisson random numbers N 1,2 with means N1,2 = Npart /2 such that each prong has, on average, the same number of constituents.Subsequently, we draw N 1 and N 2 two-dimensional coordinates x i = (x 1 , x 2 ) i from Normal distributions N(µ 1 , Σ) and N(µ 2 , Σ), respectively.We choose Σ = Diag(0.3,0.3).For the constituents' momenta p T,i , we draw random numbers from a log-normal distribution lnN(1, 0.2).We scale the p T,i by the likelihood of the constituent's 2D-coordinate in the Normal distribution it was obtained from, that is N (x i |µ 1,2 , Σ).Because we sampled the x from Normal distributions, each prong will likely have high-momentum particles close to the location parameters µ 1,2 .The sum of all constituent momenta, taken over both prongs, is finally scaled to 100 such that the mockup jet's total momentum is not random.During training, there is no information on the constituents' origin; each instance in the training set D is given solely by a total of N j = N j,1 + N j,2 constituents {p T,i , x i } Nj i=1 and a binary truth label y j .There are no global event features in this study.The loss function, for simplicity, is the mean-squared-error We have also tested the binary cross-entropy loss function and found no change in the performance.In a scenario (A) we attempt a simple classification task between a two-prong signal category µ sig 1,2 = (±1, 0) and a background category µ bkg 1,2 = (0, 0), the latter equivalent to a single prong with a mean of Npart constituents.Contour lines of the probability density function (pdf) at 1σ, 2σ, and 3σ levels alongside the constituents of a single illustrative signal event are shown in Fig. 3 (left).Training this classifier on 10 5 pseudoevents for both categories leads to a background efficiency of approx.1% for a signal efficiency of 99%.It is more interesting to study the behavior of the gNN output features before they enter the final MLP for different types of input.For this purpose, we exploit the simplicity of our toy data set: a sample of signal events is, by construction, indistinguishable from a background sample when the constituents are shifted by an amount x i → x i + ∆x i with ∆x i = −µ 1,2 , depending on whether constituent i came from prong 1 or prong 2. For illustration, we generate 10 4 independent test events and shift the constituents by ∆x i = −µ 1 + (s ∆x , 0) for prong 1 and ∆x i = −µ 2 − (s ∆x , 0).In this construction, the test sample is background-like for s ∆x = 0 and signal-like for s ∆x = 1.In Fig. 3 (middle) we show the median of the output score and the medians of the pooled gNN outputs before they enter the readout MLP.It is evident that the readout MLP learns to distinguish signal events from background events using the 20-dimensional internal representation h.The insignificance of the equivariant feature is expected, as there is no relevant azimuthal dependence of the prong structure in this classification task.
The situation is very different in scenario (B), where we train a classifier to separate between a signal with µ sig 1,2 = (±1, 0) from a background category with µ bkg 1,2 = (0, ±1).This time, the signal has a prong structure along the x 1 -axis while the background has a prong structure along the x 2 axis.In Fig. 3 (right) we show similar results as before, but this time for a signal sample that is rotated around the origin by an angle s ∆φ .Therefore, s ∆φ = 0 or s ∆φ = ±π correspond to a signal sample, while s ∆φ = ± π 2 corresponds to a background sample.We first note from Fig. 3 (right) that the internal features do not vary at all with the angle of rotation s ∆φ .This manifests the invariance of h under the operation S ∆φ in Eq. 5. Secondly, we note that there is a reflection ambiguity in the toy setup that mimics a similar reflection ambiguity of the light quarks in the W boson decay.The probability density function is symmetric under the exchange of x 1 and x 2 ; however, the value of h φ changes by π under this operation.Therefore, we investigate a function of the equivariant feature which is invariant under this reflection.The simplest such function is sin 2 h ϕ .The median of this value in the test data set is shown in blue color in Fig. 3 (right) as a function of s ∆φ .It transforms with the rotation of inputs such that the events in the test sample are identified as signal events for ro-tations that are close to even multiples of π/2 and as background when the rotation is close to odd multiples of π/2.The modulations of sin 2 h φ and the classifier output are not in phase, but that only reflects the networks' freedom in the internal representation.
In summary, the trained algorithm correctly reflects the basic properties of the classification task in both scenarios.

V. LEARNING FROM THE WZ PROCESS
We now showcase the performance of the algorithm in a more realistic setup using a sample of simulated WZ events in a semileptonic decay chain.The pp → W(→ qq)Z(→ ℓ l) events are generated with MADGRAPH5 aMC@NLO v2.65 event generator [56] in the leading-order configuration with one extra parton in the hard-scatter process.The simulation sample uses the NNPDF3.1 PDF set [57].The renormalization and factorization scales are not kept fixed, and their values are the default in MADGRAPH5, namely the transverse mass of the 2 → 2 system resulting from k T clustering.A requirement of H T > 300 GeV selects events with a sufficiently boosted W boson candidate, where H T is the scalar sum of transverse final-state parton momenta.We use the SMEFTSim v3.0 [58] model with single SMEFT operator insertions to simulate an event sample at C W = C W = 1 in order to ensure the kinematic phase space is well populated under the SM and SMEFT scenarios.The PYTHIA v8.24 [59] package is used to simulate the parton shower and hadronization.The matching between matrix element calculation in MADGRAPH5 and PYTHIA parton shower model is performed following the MLM [56] prescription.The detector response is emulated using DELPHES v3.5.0 [60] with the ATLAS card, such that the setup is equivalent to the top quark reference data set described in Ref. [61].
Since the SMEFT effects induce a modulation in the distribution of φ decay , we first use the algorithm to regress in this variable from the jet's constituents.Analogously to Sec.IV, the observed lab-frame features of a training event j are given by where the φ i and ∆R i are measured with respect the jet's axis and N j denotes the number of jet constituents.The parton-level φ j,decay is the regression target.We consider only events with a reconstructed AK8 jet with p T > 500 GeV and we do not use any global event information.We train the algorithm on 80% of the WZ data set by minimizing the loss function This choice, in particular the sine function, implements invariance to the reflection ambiguity, described in Sec.IV, at the level of the loss function: the data does not provide information to distinguish constituents originating from up-from down-type quarks and therefore will be invariant under transformations φ j,decay → π − φ j,decay .The sine removes the ambiguity and this inductive bias improves the speed of convergence.It is, however, not needed in principle.We have tested that the network also converges with a standard mean-squared error loss or, e.g., with a piecewise linear function with the same symmetry properties as the sine function.Figure 4 shows the two-dimensional distribution of the true and the regressed φ decay in the remaining 20% WZ data set, indicating a sensible behavior of the algorithm.We now turn to extracting SMEFT sensitivity, considering events with a Z boson candidate constructed from its decay products, and a reconstructed jet, both with p T > 300 GeV.For predicting SMEFT effects, we use a variant of the morphing technique [33] to obtain weighted predictions in the parameter space spanned by the Wilson coefficients.To this end, MADGRAPH5 with the MADWEIGHT [62] module is used to compute a per-event weight for a sufficiently large number of SMEFT parameter points C = C W , C W .Because second-order polynomials accurately describe the SMEFT matrix elements with a single operator insertion, a small number of such evaluations is sufficient to determine the coefficients of this polynomial, denoted by ω j (C) for an event with a label j.We choose an overall normalization of the event weights such that at the SM parameter point C = 0 we have j ω j (0) = L σ WZ (SM).It follows that the SMEFT prediction for any Poisson yield λ(C) in a phase space volume ∆x, defined in terms of the observed features x, is given by the sum of the polynomial per-event weights The second expression gives the yield in terms of the differential cross-section, the third relates the cross-section to the detector-level pdf p(x|C), the fourth introduces the joint pdf of the observation x and the latent features z, and the last term, finally, is the Monte-Carlo approximation from a simulated sample D sim = {ω j (C), x j , z j } Nsim j=1 .Because Monte-Carlo simulation is a sampling of the joint (x, z)-space space [31], the ratio of two polynomial event weights is expressed in terms of the ratio of the joint likelihood, Here, the first factor on the r.h.s.accounts for the dependence of the total cross-section on the Wilson coefficients.The simulation-based inference technique we use for learning the SM-SMEFT interference is based on Refs.[29][30][31][32][33][34] and similar to the SALLY method [30].It capitalizes on the fact that the joint likelihood ratio is tractable, i.e., simulation allows evaluating Eq. 10 as a function of C.
Because our targeted effect is linear in C, we are interested in the model's behavior near the SM where the score vector is a sufficient statistic [31].It provides a locally optimal observable, i.e., extracts the maximum amount of information from the training data [31].At first sight, it appears problematic that our quantity of interest in Eq. 11 is a ratio of detector-level likelihoods, with separate implicit latent-space integrations in the numerator and denominator, while in Eq. 10, the quantity available in simulation, there appears a ratio of latent-space pdfs without that integration.Following Refs.[29,30], we can nevertheless learn the detector-level score from expand- ing the log-derivative of Eq. 10 to linear order, This quantity is tractable for simulated events.It is then straightforward to show [29] that the analytic loss function for an infinitely expressive f , depending on the observable x but not on the latent z, provides a minimum f * (x) = t(x) as defined in Eq. 11.Our empirical loss function is, therefore, the empirical version of Eq. 13, given by where we make use of Eq. 12.All its ingredients are available in the simulation.Finally, we can remove σ(C) in Eq. 12 if we use f only as a discriminator in a hypothesis test.The total cross-section has no impact on its discriminative power as this term only provides a constant shift to the predicted value, common to all events.Minimizing this loss function over 3•10 5 training events provides a surrogate of the detector-level two-component score vector of C W and C W .The distance parameter ∆R in the gNN is set to the value of 0.4.The same network configuration as in Sec.IV is with little observed dependence on the MLP configuration.The detectorlevel transverse momenta of the leptonic and hadronic bosons show mild dependence on C at the linear level and are, therefore, used as global features for the C W component, while we do not use global features for the C W component.
We show the result of the training in Fig. 5 (left) a statistically independent test data set.The detector-level score exhibits the expected sinusoidal modulations function of true decay plane angle.The gNN very accurately reproduces this dependence, showing that the gNN can indeed recover SMEFT effects in the angular radiation patterns.
In Fig. 5 (middle, right) we perform one-dimensional likelihood scans of our surrogate model normalized to 2000 expected events.Without incorporating systematic uncertainties, this procedure will lead to optimistic results.Nevertheless, comparing the surrogate model's performance to the truth level is enlightening.The negative log-likelihood (NLL) ratio is shown for three different test statistics.First, the true decay-plane angle φ decay shows similar performance in a binned Poisson and as an unbinned test statistic, suggesting that a binned analysis in the regressed φ decay may have good sensitivity.Second, the unbinned surrogate model performs well for both, the C W and C W Wilson coefficients.Despite the complexity of the hadronic final state, the gNN thus manages to extract the leading linear SMEFT dependence from the boosted jet's constituents.In the case of C W , the surrogate model also profits from the SMEFT dependence of the global event features, increasing the sensitivity.This showcases the ability of the algorithm to combine information from the radiation patterns and observables sensitive to energy growth.For the C W coefficient, the energy growth is much smaller and φ decay dominates the sensitivity.
In combination, the results show that our gNN can extract the linear SMEFT dependence from the decay plane angle of a hadronically decaying W boson, unlocking the large hadronic branching fractions for future analyses in search of new physics.

VI. CONCLUSIONS AND OUTLOOK
In this paper, we have constructed an IRC-safe and rotation-equivariant gNN, whose main input is the variable-length list of the particle constituents of a jet produced in the hadronic decay of a boosted massive particle.The algorithm is able to extract information on the angular orientation of the radiation patterns present in the jet, decoupled from other aspects of the jet substructure thanks to the equivariance of the network.
We have investigated the main features in simple toy studies and applied the algorithm to linear SMEFT effects in semileptonic final states of the WZ process at the LHC.It learns a surrogate of the score vector, thus providing an optimal observable for small deviations from the SM.We have shown that the score is well reproduced as a function of the true decay plane angle, thus fully extracting its SMEFT sensitivity.Moreover, we incorporate information from additional observables encoding effects from energy growth, boosting the sensitivity towards the theoretical optimum.
The algorithm allows the large hadronic branching fractions to be utilized in future SMEFT analyses.It is also suitably general to be useful in a variety of applications, potentially including three-prong decays of boosted top quarks or semileptonic final states of vector boson scattering at the LHC.
adjusted to minimize a problem-specific loss function.

1 FIG. 3 .
FIG.3.Left: Contours of the probability density functions used in the toy example, along with an illustrative single event.Middle: Median of the MLP output score (black) in scenario (A) as well as the medians of the 20 gNN outputs that serve as inputs to the redout MLP (gray) in arbitrary units.The equivariant feature hγ is shown in blue.Right: Median of the MLP output score (black) in scenario (B) as well as the medians of the first 7 pooled gNN outputs that serve as inputs to the redout MLP (gray) in arbitrary units.The median of sin 2 hφ is shown in blue.

FIG. 4 .
FIG.4.Two-dimensional distribution of the true and regressed angle φ decay in the test dataset.

2 FIG. 5 .
FIG.5.Left: Mean score as a function of φ decay , obtained with the true model and the regressed surrogate model.Center and right: likelihood scan as a function of CW (center) and C W (right) using different likelihood models, assuming 2000 observed events.The surrogate model described in this paper is compared with a MLP that takes as a only input the φ decay and the likelihood function of a counting experiment considering 10 bins in φ decay .
Wilson coefficients we denote by C W and C W .The operators O W and O W induce CP-even and CP-odd modifications of the gauge boson self-interactions, while another operator, O