Exploring the SMEFT at dimension-8 with Drell-Yan transverse momentum measurements

We demonstrate that measurements of the neutral-current Drell-Yan transverse momentum distribution binned in invariant mass are sensitive to unexplored dimension-8 parameters of the Standard Model Effective Field Theory (SMEFT). These distributions are sensitive to four-fermion operators with additional QCD field strength tensors. The determination of the Wilson coefficients of these operators provides a useful diagnostic tool that distinguishes possible ultraviolet completions of the SMEFT. We study how well these effects can be probed by current LHC data, and explore the sensitivity of the future high-luminosity LHC (HL-LHC) to these operators. We find that the HL-LHC data has the potential to strongly probe this sector of the SMEFT.


Introduction
The Standard Model (SM) of particle physics successfully describes phenomena ranging from low-energy nuclear physics to high-energy collisions. However, since it does not contain neutrino masses nor dark matter, and cannot explain certain observations such as the matterantimatter asymmetry in the universe, undiscovered physics beyond the SM that explains these mysteries must exist. Experiments at the Large Hadron Collider (LHC) and elsewhere are probing the SM at the TeV scale, searching for solutions to these outstanding problems. Since no conclusive deviation from SM predictions has yet been found, a major theme of current research is to understand how heavy new physics can be indirectly probed and constrained by available and upcoming data. This effort helps guide searches for new physics by suggesting in what channels measurable deviations from SM predictions may occur given the current bounds. In the event of a discovery it would also indicate what measurements can serve as diagnostic tools to distinguish between different models of new physics.
A convenient theoretical framework for investigating indirect signatures of heavy new physics is the SM Effective Field Theory (SMEFT). The SMEFT is formed by adding higherdimensional operators to the SM Lagrangian that are consistent with the SM gauge symmetries and formed only from SM fields. The higher-dimensional operators in the SMEFT are suppressed by appropriate powers of a characteristic energy scale Λ below which heavy new fields are integrated out. The SMEFT encapsulates a broad swath of new physics models, making it easier to simultaneously study numerous theories without focusing on details of their ultraviolet completions that do not matter at low energies. The use of the SMEFT framework to analyze LHC data is similar in spirit to the use of the S and T parameters at LEP to bound entire classes of new physics models, and the global fitting of SMEFT parameters at the LHC promises to provide as powerful probe of beyond the SM theories as the global electroweak precision fit did at LEP. Complete, non-redundant bases for the dimension-6 [1][2][3] and dimension-8 operators [4,5] have been constructed. Odd-dimensional operators violate lepton-number and are not considered here. It is an ongoing effort to analyze the numerous available data within the SMEFT framework, primarily in partial analyses of individual SMEFT sectors . Recent work has been devoted to performing a global, simultaneous fit of all data available [29][30][31][32][33][34][35][36][37][38][39][40], and to study the interplay between SMEFT fits and the extraction of parton distributions from data [39,41].
Our focus in this manuscript is on semi-leptonic four-fermion operators in the SMEFT. These coefficients are not constrained by current global fits of top quark, Higgs boson and electroweak data [37,40]. While they can be probed by low-energy data [15], the strongest bounds come from Drell-Yan data at the LHC. Previous results have shown that existing Drell-Yan data is precise enough to probe dimension-8 operators in the SMEFT [42][43][44]. These works focused on measurements of the invariant mass distribution of the lepton pair in the Drell-Yan process. A motivation of our paper is to demonstrate that LHC data sets not originally intended as new physics searches can be sensitive to unprobed regions of the SMEFT parameter space, and therefore have unexpected sensitivity to physics beyond the SM. In particular we focus on the recent CMS measurement of the doubly-differential distribution of invariant mass and transverse momentum in the Drell-Yan process [45], intended as a probe of QCD dynamics. The measurement of transverse momentum makes this data set sensitive to partonic processes containing emission of gluons. These gluons can either be radiated from external legs, or directly from some heavy state that carries QCD color. In this second case they match to semi-leptonic four-fermion operators with an additional QCD field-strength tensor. Such operators first appear at dimension-8 in the SMEFT and are unconstrained by other data sets.
To illustrate our results we focus on a representative example in which only operators containing right-handed fields have non-zero Wilson coefficients. In this scenario our parameter space consists of three categories of operators: a dimension-6 four-fermion operator, two momentum-dependent dimension-8 operators that grow with energy and that have been considered in previous work [42,43], and a single CP-even semi-leptonic four-fermion operator with a gluon field-strength tensor that we henceforth label a gluonic operator. We show that a joint measurement of invariant mass and transverse momentum allows the gluonic operator to be probed independently of the other operators, as it has a distinct dependence on transverse momentum. We stress that the determination of Wilson coefficients for all three operator categories provides a useful diagnostic tool that distinguishes possible ultraviolet (UV) completions of the SMEFT. Although our primary interest is in the bottom-up analysis of the possible SMEFT parameter space, we consider the matching of example Z and vector leptoquark states to this sector of the SMEFT, and show that they lead to very different patterns of Wilson coefficients for these three operator categories. Although current LHC data provides only weak constraints on the gluonic operator, we study the potential of the high-luminosity LHC (HL-LHC) to probe these effects, and find that significant bounds on all three categories of effects can be obtained. We encourage this measurement to be performed again as larger LHC data sets become available.
Our paper is organized as follows. We review in Section 2 details of the SMEFT needed for our analysis. In Section 3 we study the matching of example UV states onto the four-fermion sector of the SMEFT. Our emphasis in this section is to show that very different patterns of Wilson coefficients can be obtained from different UV states, motivating the measurement of all possible operator types. In Section 4 we show that the doubly-differential distribution in invariant mass and transverse momentum can simultaneously probe both the regular and gluonic semi-leptonic four-fermion operators. We perform fits to the current data in Section 5, and to simulated HL-LHC data in Section 6. We conclude in Section 7.

Review of the SMEFT
We review in this section aspects of the SMEFT relevant for our analysis of the Drell-Yan process. The SMEFT is an effective field theory extension of the SM that includes terms suppressed by an energy scale Λ. Beyond this scale the ultraviolet completion of the EFT becomes important, and new particles beyond the SM appear. In our study we keep terms through dimension-8 in the 1/Λ expansion, and ignore operators of odd-dimension which violate lepton number. Our Lagrangian becomes where the ellipsis denotes operators of higher dimensions. The Wilson coefficients defined above are dimensionless. Cross sections computed through O(1/Λ 4 ) will have contributions from the square of dimension-6 operators, as well as interferences between dimension-8 operators and the SM. The categories of operators contributing to the Drell-Yan process through dimension-8 were extensively cataloged in Ref. [43]. At the dimension-6 level three categories of operators contribute: corrections to the three-point vertices of gauge bosons with fermions, four-fermion operators, and dipole operators coupling fermions to gauge bosons. The vertex corrections lead to effects that scale with energy as O(v 2 /Λ 2 ), where v denotes the Higgs vacuum expectation value. These are subleading at high energies compared to the four-fermion operators that scale as O(s/Λ 2 ), and are strongly constrained by Z-pole observables [46]. We therefore neglect these terms in our analysis. We additionally assume minimal flavor violation for the structure of our Wilson coefficients. This assumption makes all dipole operators, as well as all scalar and tensor four-fermion operators, proportional to SM Yukawa couplings. These couplings are small for the processes considered here, and can be safely neglected. This leaves us with only vector-like four-fermion operators contributing at dimension-6. The contributing terms are summarized below in Table 1. q and l denote O ed (ēγ µ e)(dγ µ d) Table 1: Dimension-6 four-fermion operators contributing to Drell-Yan at leading order in the coupling constants.
left-handed quark and lepton doublets, while u, d and e denote right-handed singlets for the up quarks, down quarks and leptons, respectively. τ I denote the SU(2) Pauli matrices. Several classes of operators contribute at the dimension-8 level. Considering first the leading-order four-fermion process, we again have corrections to the ff V vertices, fourfermion operators with Higgs insertions, and four-fermion operators with derivative insertions. The first two categories of operators were shown in Ref. [43] to be negligible for reasonable values of the Wilson coefficients, consistent with their energy scaling: O(v 4 /Λ 4 ) for the first category and O(sv 2 /Λ 4 ) for the second. The four-fermion operators with derivative insertions scale as O(s/Λ 4 ) and are non-negligible. They are shown in Table 2. We note that the type-II operators lead to novel angular dependence [42], but vanish upon integration over angles up to small corrections due to acceptance cuts [43]. While we discuss them when matching of specific UV examples to the SMEFT, we do not consider them in our numerical analysis since the distributions considered here show little sensitivity to these effects. A proposal for a series of angular measurements to probe these terms was given in [42,47].
In our study we will consider the transverse momentum spectrum in Drell-Yan. In this case we also need to consider dimension-8 operators with a gluon field-strength inserted. The possible operators of this form were enumerated in Refs. [4,5]. We organize them according to their CP transformation properties in Table 3. These transformation rules can be obtained using results from any standard QFT text, or from studying the structure of an explicit amplitude calculation. Since the CP -odd operators do not interfere with the tree-level SM amplitudes we do not consider them in this study.

Example UV models
Our primary interest in this paper is the study of the SMEFT from the bottom-up, without reference to explicit UV models. Without further experimental guidance as to the CP -even Table 3: Dimension-8 four-fermion operators with a gluon field that contribute to the Drell-Yan transverse momentum spectrum, organized according to their CP transformation properties.G denotes the dual field-strength tensor.
form of new physics it is important to fully explore the possible parameter space without the introduction of theoretical biases. Another motivation of our work is to determine what particular experimental data sets, in this case the measurement of the Drell-Yan transverse momentum spectrum at high invariant mass, can teach us about different sectors of the SMEFT. However, we do wish to demonstrate how the operators enumerated in Section 2 can be obtained from explicit UV examples. In particular this study will show that the dimension-8 effects can serve as a useful diagnostic tool to distinguish between different UV states. We will study only example heavy particles here, and briefly mention how they can be embedded into full UV models. More detailed examples of the matching of full UV models to the dimension-8 level are discussed in the literature [48,49]. In our numerical studies we will focus on the example of all-singlet fermion operators. We consider two UV examples that can lead the operators of interest here: a Z and a vector leptoquark.

Right-handed Z model
We first study a Z boson coupled to SM singlets. The Lagrangian for this state is given by Here, g f R denotes the charge of fermion f under the U(1) gauge group, while g Z is an overall coupling strength of the Z that is extracted from the charges. Although it is not our intent here to discuss full UV models we note that this state can be embedded into an anomaly-free U(1) gauge theory with additional fermionic matter that can be taken heavy, in which cases the charges g f R are fixed [50]. To determine the Wilson coefficients this Lagrangian leads to for the operators introduced in Section 2 we compute the process u 1ū2 → l 3l4 , to fix C eu and C (1) e 2 u 2 D 2 . We also compute u 1ū2 → l 3l4 g 5 to fix C e 2 u 2G. A straightforward calculation of the amplitude expanded in the limit s M 2 Z , where s = (p 1 + p 2 ) 2 is the usual partonic Mandelstam invariant, leads to From this we can read off the Wilson coefficients: An identical matching calculation for the down-quark channel fixes C ed and C (1) For simplicity, in the rest of this work we neglect these down-quark Wilson coefficients, and focus on the up-quark sector. We note that upon factoring out the dependence on the dimensionful scale M Z the Wilson coefficients at dimension-6 and dimension-8 are identical in magnitude; there is no suppression of the dimension-8 coefficient.
We now consider the process u 1ū2 → l 3l4 g 5 . At tree-level two diagrams contribute, with the additional gluon radiated from either initial-state quark. In both cases there is no hard scale in the virtual quark propagator, and the only expansion possible is for the Z propagator. This indicates that this process is completely determined by the emission of a gluon after the insertion of either O eu or O e 2 u 2 D 2 , and consequently

Vector leptoquark model
We now consider a vector leptoquark coupled to right-handed leptons and quarks. The general Lagrangian for such a state is given in Refs. [51,52]. We assume a leptoquark coupled to QCD, and coupled to right-handed up quarks and leptons, for which the Lagrangian takes the form Here, the Roman indices i, j denote color indices in the fundamental representation. The quantity G denotes the field strength tensor of the SM gluon field. The field strength tensor and covariant derivatives of the leptoquark are given by The coupling κ U is related to the magnetic moment of the leptoquark. We note that it has been argued that complete leptoquark models generically contain Z bosons as well [53]. These lead to Wilson coefficient contributions similar to those discussed in Section 3.1, and are not explicitly considered here since we focus instead on the unique aspects of the vector leptoquark. We also note that leptoquarks have received renewed interest recently due to their possible role in resolving outstanding flavor anomalies [53]. We compute the same partonic processes as before to determine the Wilson coefficients. A straightforward calculation of u 1ū2 → l 3l4 leads, after a Fierz rearrangement, to the amplitude In order to match this amplitude to the operators considered previously we have to decompose the t in the numerator according to the contributions to the operators O (1) Doing so we arrive at the Wilson coefficients for the operators considered here: We note that the leptoquark also contributes to C (2) e 2 u 2 D 2 , unlike the Z . This is the first example of how dimension-8 coefficients can help distinguish between models. A non-zero C (2) e 2 u 2 D 2 , which can be determined from an analysis of the type considered in [42], would disfavor a Z UV model.
Another difference between the Z and leptoquark comes when we consider the gluonic process u 1ū2 → l 3l4 g 5 . There is a tri-linear coupling U U g which means that the gluon can be emitted from the t-channel leptoquark. Upon expanding around large M U , this leads to a local contribution described by the operator O e 2 u 2G. We relegate details of the matching calculation to Appendix A, and simply note the result here: This again illustrates our point that measurement of the complete sector of four-fermion operators can discriminate between UV models. C e 2 u 2G is not induced in Z models, but it is in vector leptoquark models. We note as well the following points regarding the Wilson coefficients found in this calculation: upon removal of the dimensionful quantity M U the dimension-6 and dimension-8 Wilson coefficients are similar in size, and the coupling C e 2 u 2G can be larger than C (1) e 2 u 2 D 2 for negative κ U . We will refer to both of these points during discussions in later sections. It has been pointed out that positivity bounds on dimension-8 Wilson coefficients can be derived from the underlying principles of quantum field theory [54,55]. We note that the simplest elastic positivity constraints do not restrict the parameter space of the C (1) e 2 u 2 D 2 and C e 2 u 2G coefficients [47].

Motivation for the doubly-differential Drell-Yan distribution
A main motivation of our work is to demonstrate that LHC data sets not originally intended as new physics searches can be sensitive, sometimes in novel ways, to the SMEFT parameter space, and therefore have unexpected sensitivity to physics beyond the SM. As an example, the CMS experiment has measured the lepton-pair transverse momentum spectrum in several invariant bins ranging up to 1 TeV [45]. Results are normalized to the Z-peak region in order to reduce the dependence of the measurement on systematic errors. The measurement was performed at 13 TeV with 36.3 fb −1 of integrated luminosity. Before performing fits of this data to the SMEFT framework to demonstrate its sensitivity we describe why this data set is particularly interesting to probe the set of operators described in the previous sections. We show below in Figs. 1 2, and 3 the ratio of SMEFT corrections to the SM result as a function of transverse momentum for the two highest invariant mass bins available in the measurement of Ref. [45], turning on the three Wilson coefficients C eu , C e 2 u 2 D 2 , and C e 2 u 2G separately (we drop the superscript on C e 2 u 2 D 2 henceforth, since we do not consider C (2) e 2 u 2 D 2 further in this work). The SM has been computed at NLO in QCD using the program MCFM [56]. We have set each Wilson coefficient to unity when making these plots. Since the coefficient C eu contributes first at dimension-6 it shows the largest deviations from the SM result. However, the deviation does not vary significantly with p T . This is consistent with the structure of the dimension-6 EFT correction, which does not depend on the momentum flow into the effective vertex. The deviations for both C e 2 u 2 D 2 and C e 2 u 2G increase with p T , consistent with the fact that the effective vertex depends on the momentum flow. While the C e 2 u 2 D 2 deviation increases moderately with transverse momentum, C e 2 u 2G increases rapidly. The EFT vertex for this operator is directly proportional to the gluon momentum, and therefore the p T of the lepton pair by momentum conservation. This is the motivation for the analysis of this data set within the SMEFT framework: the p T distribution offers additional sensitivity to gluonic operators not present with invariant mass distributions alone.

Calculational framework and fits to the current data
We begin by performing a fit of the current CMS measurement to the SMEFT framework. Although we will find that there is limited sensitivity to the gluonic operators at this point, we find it useful to quantify the current sensitivity and to also establish our notation for later sections. We will for simplicity focus on operators that contain right-handed fermions only. Possible UV models leading to such operators were discussed in Section 3, where we introduced the possibility of using C e 2 u 2G to distinguish between them. We focus on invariant mass bins above the Z-peak, since the SMEFT-induced corrections grow with energy. We also focus on transverse momentum bins above 50 GeV, following the same logic as for invariant mass, and also to avoid phase space regions where p T resummation may play a role. This leaves us with the eight bins shown below in Table 4. We note that these bins are normalized by the experiment to the Z-peak region 76 ≤ m ll ≤ 106 GeV.
We now give the details of our calculational framework. We compute the SM cross section at next-to-leading order (NLO) in QCD using the MCFM program [56]. We use the NNPDF 3.1 NLO parton distribution functions [57]. To compute the PDF errors we follow the standard procedure for Monte Carlo replica sets [57]. To estimate the error arising from higher-order QCD corrections we set the renormalization and factorization scales to the central value  Table 4: Bins in invariant mass and transverse momentum used in Ref. [45]. The p T values refer to bin boundaries. There is an upper cut of 1 TeV on p T in all bins.
and vary them around this value in an uncorrelated way according to We find the largest variation within this range, and form a symmetric scale uncertainty using this largest variation. We note that this technique leads to slightly more conservative errors than the usual approach, in which the width of the scale variation band without symmetrization is used. We note that the PDF uncertainties are strongly correlated between different bins. We assume that the scale uncertainties are uncorrelated between bins. Since this data involves bins at high energies, it is important to quantify the effect of electroweak Sudakov corrections. We are unaware of a publicly available code that computes the electroweak Sudakov logarithms as a function of p T off the Z-peak. To estimate the impact of these corrections we compute the next-to-leading-logarithmic electroweak Sudakov corrections [58] for each invariant mass bin integrated inclusively over p T , apply this correction to each of the p T bins for that invariant mass, and assign half of this correction as an additional theoretical error. The lower bin boundaries in p T , which provide the largest contributions to each bin, do not go above 160 GeV. For the higher two invariant mass bins in our analysis this p T value is less than the invariant mass. Since the Sudakov logarithms grow with the Mandelstam variable s that enters the process, and s is dominated by the lepton invariant mass for the reason stated above, we believe that this is a reasonable estimate. We note that their effect ranges from 1% to 4% as we increase the invariant mass bin, and has little effect on the quality of the fit to data. For the SMEFT cross section, we work at leading order in QCD.
We use the experimental uncertainties as provided by the CMS collaboration. These uncertainties range from 1.5% to 8.9%, increasing with both invariant mass and transverse momentum, and contain a mix of both correlated and uncorrelated errors. The systematic uncertainties are dominant in the high invariant mass bins. We define a χ 2 test to quantify the deviation of the SMEFT cross sections from the SM: where ∆σ 2 ij signifies the error matrix composed of both theoretical and experimental uncertainties. We then extract the 95% CL bounds of the Wilson coefficients based on χ 2 fits. Before studying the SMEFT we note that the SM furnishes an acceptable fit to the data, with a χ 2 per degree of freedom of 1.4.  We consider turning on only single operator at a time, turning on pairs of operators, and turning on all three. Figure 4 shows the 95% CL ellipses of C eu together with either C e 2 u 2G or C e 2 u 2 D 2 , as well as the bounds with only one Wilson coefficient enabled. Although the data is less sensitive to C e 2 u 2G than to C e 2 u 2 D 2 , the circular nature of the ellipse in Figure 4 indicates that C e 2 u 2G has little correlation with C eu . The stretched narrow ellipse of C eu and C e 2 u 2 D 2 shows a strong correlation between these two operators, with the effects of the two coefficients indistinguishable with the current data.
A potential issue that must be addressed when studying these constraints is the convergence of the EFT expansion. As can be seen from the examples in Section 3 we have the following rough relations between parameters in UV models and those appearing the SMEFT: We want any potential resonance to lie above the scales probed experimentally. We take this constraint to be M > 1 TeV, the highest scale probed in this data, which translates to the bounds The exact numerical value of this constraint depends on the coupling g, and therefore on the details of the UV model. We take the strong coupling limit g = √ 4π, leading to the least stringent constraint, in order to avoid ruling out allowed parameter space. This slightly restricts the allowed parameter space in the joint C eu , C e 2 u 2G in Fig. 4. It does not affect the joint C eu , C e 2 u 2 D 2 result.
We list the 95% CL bounds with only one operator enabled in Table 5. We also calculate the bounds with multiple operators by marginalizing over the couplings. We further impose the effective scale constraint on the marginalized bounds in the final column of this table. The results are also listed in Table 5. We observe that C e 2 u 2 D 2 has a strong impact to the bounds on C eu , while C e 2 u 2G only mildly changes this limit. Turning on C e 2 u 2 D 2 significantly weakens the bounds on C eu , an effect also observed with LHC invariant mass distributions in [43]. In general there is limited sensitivity of this data set to the dimension-8 coefficients with Wilson coefficients reaching O(100) still allowed.

Fits to simulated HL-LHC data
We found in the previous section that the current data shows little sensitivity to C e 2 u 2G. We consider next the potential the high-luminosity LHC to probe this SMEFT parameter space through a similar analysis. Since no data is yet available we resort to pseudodata generated with the NLO SM cross section. The HL-LHC pseudodata is generated under similar conditions as the CMS measurement [45]. We assume the center-of-mass energy √ s = 14 TeV, an integrated luminosity 3 ab −1 and a dilepton transverse momentum cut p T ≥ 100 GeV. Since the eventual HL-LHC binning is unknown, we consider two possible sets of bins in the dilepton invariant mass and transverse momentum. The binning for the dilepton invariant mass m ll is motivated by the simulation in [59]. For the binning of the  Table 5: 95% CL bounds for the Wilson coefficients C eu , C e 2 u 2G and C e 2 u 2 D 2 from current CMS data. The first column shows the bounds assuming one operator is enabled at a time. The second column shows the bounds on a given coefficient with the other enabled operators allowed to vary as well. The third column shows these bounds with the dimension-8 coefficients restricted according to the discussion in the main text.
dilepton transverse momentum p T (ll), we enforce that the relative statistical uncertainty of each bin cannot exceed 10%. As such, we discard the highest m ll bin in [59] where 2600 ≤ m ll ≤ 14000 GeV. Next, two different binning strategies are applied: a coarse binning where the relative statistical uncertainty of each bin should be smaller than 5% if possible * ; a fine binning where the relative statistical uncertainty of each bin must be smaller than 10%. We show the explicit bins used in Appendix B, with the coarse binning shown in Table 6 and the fine binning in Table 7. We assume that the pseudodata is affected by three sources of experimental uncertainties: the statistical uncertainty ∆σ stat , the uncorrelated systematic uncertainty ∆σ uncorr and the * The only exception is when 2000 ≤ m ll ≤ 2600 GeV, where the largest possible p T bin is 100 ≤ p T ≤ 7000 GeV. The relative statistical uncertainty of this bin is larger than 5%, but still smaller than 10%.
fully correlated systematic uncertainty ∆σ corr . We construct the cross section of bin b using where r b and r are random numbers generated with a normal distribution of mean 0 and standard deviation 1. For uncorrelated uncertainties, a separate random number is chosen for each bin. For the correlated uncertainty, a single random number r is used across all bins. We assume the relative uncorrelated systematic uncertainty ∆σ uncorr,b /σ SM b = 1% and the relative correlated systematic uncertainty ∆σ corr,b /σ SM b = 2%. These choices are consistent with the current values found in the CMS measurement [45]. We normalize the cross section to the Z-peak region 76 ≤ m ll ≤ 106 GeV. We emphasize that the relative uncorrelated uncertainty of the Z-peak region is still assumed as 1%, but any correlated uncertainties in this region are disregarded. The error matrix contains the experimental uncertainties, the theoretical PDF uncertainties and the scale uncertainties. The experimental part is constructed with ∆σ stat , ∆σ uncorr and ∆σ corr . The PDF and scale uncertainties only contain SM contributions. For each set of random numbers r b and r generated, we perform a χ 2 fit as described in Section 5. Each set of random numbers signifies one pseudo-experiment, and for each pseudo-experiment e, a set of best-fit Wilson coefficients {C i,e } is obtained. A total number of 1000 pseudo-experiments are evaluated, and the average best-fit values are obtained by The covariance matrix for each pseudo-experiment is given by The 95% CL bounds on the Wilson coefficients are extracted by where N is the number of Wilson coefficients. For N = 1, 2, 3, ∆χ 2 = 3.841, 5.991, 7.815, respectively. The average inverse covariance matrix is defined as We now consider fits to the pseudodata with one, two or three Wilson coefficients enabled. Figure 5 shows the 95% CL bounds on C eu and C e 2 u 2G with either one, two or all three operators enabled. Figure 6 shows the 95% CL bounds on C eu and C e 2 u 2 D 2 . The round shape of the ellipse in Figure 5 and the narrow shape in Figure 6 confirms what we learned from the existing CMS data: there is little correlation between C eu and C e 2 u 2G, and a stronger correlation between C eu and C e 2 u 2 D 2 . We observe that the inclusion of C e 2 u 2 D 2 loosens the bounds on C eu , while the inclusion of C e 2 u 2G has much less impact on C eu . This is consistent with the previous observation that the inclusion of transverse momentum data provides a separate handle on the gluonic operators. We also observe that the fine binning leads to tighter bounds than the coarse binning. The gray area in Figure 5 shows the region where the effective scale constraint of the dimension-8 operator is violated, and the EFT expansion is no longer valid. We demand the constraint M > 3 TeV, consistent with the upper limit of our invariant mass binning, so that the dimension-8 Wilson coefficients must satisfy The left diagram shows the bounds with the coarse binning, while the right one shows the bounds with the fine binning. The blue lines denotes the bounds with only one of the operators enabled, the green line denotes the bounds with two operators enabled, and the orange line denotes the bounds with C e 2 u 2G also enabled. The energy scale Λ is set to 4 TeV, and the effective scale constraint is set to 3 TeV.
We observe that the HL-LHC data has the potential to measure these three couplings separately. Although some correlation between C eu and C e 2 u 2 D 2 remains, it is weaker than found with the current data, and only weakens the C eu bounds by a factor of two. Referring to the fine binning results, we observe that there is a hierarchy in the sensitivities to these three coefficients: C eu values of O(0.1) can be probed, C e 2 u 2 D 2 values of O(1) can be probed, while the sensitivity to C e 2 u 2G drops to O (10). Considering that the EFT expansion parameter is chosen as Λ = 4 TeV, these results indicate sensitivity reaching into the multi-TeV region for all three operators. We recall that C e 2 u 2G can be enhanced in certain regions of leptoquark parameter space, as discussed in Section 3.2. This indicates that the HL-LHC doublydifferential Drell-Yan data can serve as a useful diagnostic tool for realistic UV states. We summarize the potential of the EIC by showing in Fig. 7 the effective UV scales that can be probed for each parameter when only a single coupling is turned on, and when all three are turned on. We recall from Eq. (14) that the effective scale is related to the heavy resonance mass in the UV theory scaled by either g (for dimension-6) or √ g (for dimension-8) as shown in Section 5. Effective scales approaching 10 TeV can be probed for C eu , while sensitivities reaching several TeV are possible for the dimension-8 coefficients.

Conclusions
In this paper we have studied probes of the semi-leptonic four-fermion sector of the SMEFT that are possible with neutral-current Drell-Yan measurements at the LHC. We have extended previous studies by including dimension-8 operators with additional gluon field-strength tensors. These operators directly modify the high transverse momentum region in Drell-Yan production. A motivation for this work is a recent CMS measurement of the transverse momentum distribution for the Drell-Yan process further binned in invariant mass. Although this work was intended primarily as a QCD study, it has novel BSM sensitivity as well, and provides direct access to this previously unexplored sector of the SMEFT.
To motivate our study we have demonstrated that example UV models can lead to very different patterns of Wilson coefficients for these gluonic operators; some states generate potentially sizable Wilson coefficients for this dimension-8 operator, while others do not generate these operators. This ability to discriminate between different UV completions of the SMEFT would be missed if the SMEFT expansion was truncated at the dimension-6 level; the example models considered here would only match to a single dimension-6 operator. Measurement of the entire suite of semi-leptonic four-fermion coefficients through dimension-8 can therefore help distinguish between different models of new physics. We have considered fits of the SMEFT framework to both the current CMS measurement and to simulated future HL-LHC data. While the current data shows little sensitivity to the gluonic operator, there are good prospects for probing this effect with future data. We encourage this measurement to be performed with future data and its BSM potential to be further explored.

A Leptoquark matching
In this Appendix we study the matching of a vector leptoquark to the SMEFT in the process u 1ū2 → l 3l4 g 5 in order to determine the Wilson coefficient C e 2 u 2G. There are three contributing diagrams, two where the gluon is emitted from an initial quark, and one where it is emitted from the leptoquark. The amplitudes for each diagram take the form using the Lagrangian presented in Section 3. We note that t 15 = (p 1 − p 5 ) 2 , etc. We can expand these expressions in the large M U . The dimension-6 contribution comes from the first two diagrams; the t 15 , t 25 in the denominator make it clear that these match to the emission of a gluon off of a dimension-6 four fermion operator. We focus here on the expansion to dimension-8. The contribution from each diagram is It is clear that the first two diagrams come from emitting a gluon from dimension-8 fourfermion operators, and do not match to a local operator with a gluon. This can be seen from the t 15 , t 25 in the denominator. The same is true for the first term of M 3 . This can be determined most simply by demanding gauge invariance: the amplitude must vanish upon replacing 5 → p 5 . The first two diagrams are not invariant themselves. Only upon adding the first term of diagram three is gauge invariance satisfied. The last term of M This leaves the last term of M 3 to match to a local dimension-8 operator qqllg. To simplify this we apply the following Fierz identity: v i 2 γ ν P R v 4ū3 γ µ P R u j 1 = 1 2 {−g µν g ρσ + g µρ g νσ + g µσ g νρ − i µνρσ }v i 2 γ σ P R u j 1ū 3 γ ρ P R v 4 .
Only the antisymmetric term survives when we plug this into the amplitude, leaving us with This matches to the local dimension-8 operator eγ µ eūγ ν T a uG a µν (25) with the Wilson coefficient:

B HL-LHC binning
We present in this Appendix the two choices for HL-LHC binning used in our analysis: a coarse binning where the relative statistical uncertainty of each bin is smaller than 5%, and a fine binning where the relative statistical uncertainty of each bin must be smaller than 10%. The coarse binning is shown in Table 6, while the fine binning is shown in Table 7. Table 7: The fine binning where the relative statistical uncertainty of each bin must be smaller than 10%. The first column shows the ranges of the m ll bins, and the second column shows the boundaries of the p T bins.