Can New Physics hide inside the proton?

Modern global analyses of the structure of the proton include collider measurements which probe energies well above the electroweak scale. While these provide powerful constraints on the parton distribution functions (PDFs), they are also sensitive to beyond the Standard Model (BSM) dynamics if these affect the fitted distributions. Here we present a first simultaneous determination of the PDFs and BSM effects from deep-inelastic structure function data by means of the NNPDF framework. We consider representative four-fermion operators from the SM Effective Field Theory (SMEFT), quantify to which extent their effects modify the fitted PDFs, and assess how the resulting bounds on the SMEFT degrees of freedom are modified. Our results demonstrate how BSM effects that might otherwise be reabsorbed into the PDFs can be systematically disentangled.

Searches for physics beyond the Standard Model (BSM) at high-energy colliders can be divided into two main categories: direct searches, aiming to detect the production of new heavy particles, and indirect searches, whose goal is to identify subtle deviations in the interactions and properties of the SM particles. The latter would arise from virtual quantum effects involving BSM dynamics at energies well beyond the collider centre-ofmass energy. Both strategies are actively pursued at the LHC by exploiting its unique energy reach [1] and its thriving program of precision measurements [2][3][4].
In this context, the SM Effective Field Theory (SMEFT) [5][6][7][8][9][10][11][12][13] represents a powerful modelindependent approach to identify, interpret, and correlate potential BSM effects from precision measurements under the assumption that the new physics scale, Λ, is well above the energies probed by the experimental data. Here, BSM effects can be parametrised at low energies in terms of dimension-six operators, O i , constructed from SM fields satisfying its symmetries: where L SM is the SM Lagrangian, {a i } are the Wilson coefficients parametrising the high-energy BSM dynamics, and N is the number of non-redundant operators. These Wilson coefficients can be constrained from measurements ranging from Higgs, gauge boson, and electroweak precision observables [14][15][16][17] to top quark production [18,19], flavour observables [20,21], and low- * Electronic address: m.ubiali@damtp.cam.ac.uk energy processes [22], among others. In the case of LHC data, high-energy processes such as Drell-Yan, diboson, and top-quark production at large invariant masses [23][24][25][26][27][28][29] play a key role since energy-growing effects often enhance the sensitivity to the SMEFT contributions. Several of the high-energy LHC measurements that constrain the SMEFT parameter space are also used to provide stringent constraints on the proton's parton distribution functions (PDFs) [30,31]. Prominent examples include the large-x gluon from top-quark pair [32,33] and jet production [34,35], and the quark-flavor separation from high-mass Drell-Yan and W and Z boson production in association with jets [36][37][38][39]. This implies that BSM effects, if present in the high-energy tails of those distributions, could end up being "fitted away" into the PDFs. These concerns are particularly acute for the full exploitation of the Run II and III datasets, as well as from the High-Luminosity phase [40] where many PDFsensitive observables will reach the few-TeV region [41].
In this work, we want to address two main questions. First, how can one assess whether BSM effects have been absorbed into the fitted PDFs? And second, how are the bounds on the SMEFT coefficients modified if the PDFs used as input to determine them had been fitted using a consistent BSM theory? To answer them, we present here a first simultaneous determination of the proton's PDFs and the SMEFT Wilson coefficients {a i } by means of the NNPDF framework [42][43][44][45]. As a proof of concept, we consider the constraints from deep-inelastic scattering (DIS) structure functions on representative fourfermion operators. This way, we are able to quantify to which extent SMEFT effects (which parametrise general BSM dynamics) can be reabsorbed into the flexible neural-network based PDF parametrisation [46]. This has required extending the NNPDF framework such that arXiv:1905.05215v1 [hep-ph] 13 May 2019 cross-sections can be evaluated including BSM corrections at the fit level. We also assess how the bounds on the SMEFT coefficients are modified in this joint fit as compared to the traditional approach where PDFs are kept fixed. See [47,48] for related xFitter [49] studies restricted to H1 and ZEUS data and to one-parameter fits.
Here we study the impact of operators of the form where l R and q R stand for right-handed charged leptons and quarks fields. We assume coupling universality in the lepton sector but not in the quark one, in order to evade the strong constraints from LEP precision data [50]. These operators lead to a energy-growing effects and their contributions are weighted by the corresponding PDFs, two properties that provide powerful handles for discriminating them. The calculation of the SMEFT corrections from the O lq operators in Eq. (2) to DIS structure functions can be performed in analogy with the corresponding SM computation. For instance, F 2 will now contain terms linear and quadratic in a u , the coefficient of O lu in Eq. (1): where ), s W = sin θ W , c W = cos θ W , and u (ū) represents the up (anti-)quark PDF. The terms linear in a u arise from the interference with the SM amplitudes and are suppressed as Q 2 /Λ 2 . Similar expressions to can be evaluated for the contributions from O ld , O ls , and O lc , and for the parity-violating structure function xF 3 , while ∆F smeft L = 0 at leading order. In this work we will keep only the leading O Λ −2 terms in Eq. (3), though we have verified that results are stable upon the addition of the O Λ −4 ones. These SMEFT-augmented structure functions have been implemented into APFEL [51,52]. The DGLAP equations for the scale evolution of the PDFs are unaffected.
Since SMEFT effects are suppressed as Q 2 /Λ 2 , only measurements involving large momentum transfers Q 2 will be sensitive to them. The only DIS experiment that has explored the region Q ∼ > M W is HERA [53], whose legacy structure function data [54] reach up to Q max 250 GeV. In Fig. 1 we display the percentage shift in the e − p neutral current (NC) DIS cross-section, as a function of x and Q 2 , for a representative choice of coefficients given by a u = a c = 0.28 and a d = a s = −0.10. As in the rest of the paper, we assume here that Λ = 1 TeV. The corrections depend only mildly on Bjorken-x and increase rapidly with Q, reaching up to 20% for the upper HERA kinematic limit. Given that for a sizable region of the SMEFT parameter space the shifts Eq. (4) are comparable or bigger than the experimental uncertainties of the precise HERA structure functions, the latter can be exploited to impose bounds on the allowed ranges of the coefficients {a q }. First of all, we evaluate the values of χ 2 tot ({a q }) for the DIS measurements used in NNPDF3.1 [55], corresponding to n dat = 3092 data points from BCDMS, SLAC, NMC, CHORUS, NuTeV, and HERA. In this calculation, we use NNPDF3.1 NNLO DIS-only as input with consistent theory settings such as FONLL-C [56] and fitted charm [57]. This is repeated for a range of SMEFT benchmark points (BPs) (listed in the Appendix) and for the N rep = 100 Monte Carlo replicas. The resulting χ 2 tot ({a q }) values are then fitted to a quadratic form, where H qq are the elements of the Hessian matrix in the quark flavour space. Note that Eq. (5) is exact if the O(Λ −4 ) corrections are neglected, else it is valid only close to a local minimum. We have performed the fits of the SMEFT coefficients both varying a single operator at a time (individual fits) as well as varying the four of them simultaneously and then marginalizing over each one.
In Table I we indicate the 90% confidence level (CL) intervals for the four coefficients obtained with fixed input PDFs. We compare the individual bounds with the marginalised ones from the four-dimensional fits, without and with PDF uncertainties. In the former case, theory calculations are obtained using the central replica. In the latter case, we compute the bounds for the N rep = 100 replicas and take the envelope of the 90% narrower ones.  The most stringent bounds are obtained for a u , followed by a d , and then a c and a s . This is consistent with the fact that the SMEFT corrections proportional to a q are weighted by the corresponding PDFs in Eq. (3), and that in the HERA region u(x) ∼ > d(x) s(x), c(x). The marginalised bounds are looser than the individual ones by up to an order of magnitude, highlighting the relevance of exploring simultaneously the widest possible region of the parameter space. PDF uncertainties turn out to be moderate. For the individual fits, the bounds are stable upon the inclusion of O Q 4 /Λ 4 terms.
The main limitation of the bounds reported in Table I is that they are affected by double counting, since the same HERA data was already included in the NNPDF3.1 fit used here to evaluate the DIS structure functions with SMEFT effects. The very same problem arises for the interpretation of collider measurements that are used to constrain both the PDFs and the SMEFT parameter space, such as jet, Drell-Yan, and top quark pair production. To bypass this limitation, the way forward is provided by the simultaneous extraction of the PDFs and the SMEFT degrees of freedom {a q }, in the same way as in joint extractions of PDFs and the strong coupling constant [58].
We have thus carried out variants of the NNPDF3.1 NNLO DIS-only fit now using as theory input the structure functions with SMEFT corrections. These fits have been performed for the same BPs as in the fixed-PDF analysis, and are based on 300 replicas to tame statistical fluctuations. Defining ∆χ 2 , we find that the BP with the largest improvement (deterioration) with respect to the SM has ∆χ 2 smeft −10 ( 90), see Fig. 2. In all cases, χ 2 tot decreases as compared to the pre-fit (fixed-PDF) result, indicating that SMEFT effects are being partially reabsorbed into the PDFs.
From Fig. 2 one expects that in the fits with SMEFT corrections the resulting PDFs will be distorted as compared to their SM-based counterparts. Here the flexible NNPDF parametrisation is suitable to robustly assess to what extent such effects can be reabsorbed into the PDFs. Firstly, we find that the quark valence distributions are rather similar to those of the SM case, see the Appendix. The reason is that quark PDFs are dominantly fixed by the moderate Q 2 fixed-target DIS  data, and thus unaffected by the high-Q 2 HERA structure functions. More significant differences are observed for the gluon PDF. Within a DIS-only fit, the gluon is mostly constrained from the scaling violations between the low-and high-Q 2 data, which are strongly modified in the presence of energy-growing SMEFT effects. In Fig. 3 we show the gluon in the fits based on the (a u , a d , a s , a c ) = (−0.3, −1.8, −5, 5) and (0, 1.2, 10, 0) BPs, normalised to the SM and where PDF uncertainties are only displayed for the latter. These are two of the BPs leading to the largest deviations from the SM at the χ 2 level, with ∆χ 2 smeft 65 and 41 at the pre-fit level respectively, while also being consistent with the bounds from the HERA data in Table I. We find that the SMEFT-induced distortions can be comparable with the PDF uncertainties and thus should be taken into account. These distortions would be even more pronounced in a global fit, where the gluon can be extracted with higher precision.
The different energy scaling of the SMEFT effects as compared to the QCD ones (polynomial in the former, logarithmic in the latter) can be exploited to disentangle BSM dynamics from QCD ones within the PDF fit. In Fig. 4 we display χ 2 hera /n dat as a function of the cut Q max that fixes the maximum value that enters the χ 2 evaluation. Results are shown both in the SM and in the SMEFT for a u = a d = −1.3 and a s = a c = 0, and in the latter case both for the pre-fit (fixed-PDF) and post-fit cases. While for Q max ∼ > 50 GeV the value of χ 2 hera /n dat is flat for the SM case, there is a rapid degradation in the fit quality for the SMEFT case. This result further highlights that BSM effects cannot be completely "fitted away". Such distinctive trend in the high-energy behaviour of the theory would represent a smoking gun for BSM effects, similar to how BFKL dynamics were recently identified from small-x HERA data [59].
In Table II we indicate the individual and marginalised 90% CL intervals for the SMEFT coefficients from this joint extraction together with the PDFs, see Table I for the fixed-PDF ones. We find that the bounds are rather similar in both cases, consistent with the evidence from Figs. 2-4 that SMEFT corrections are only partially reabsorbed in the PDFs. As expected, the individual limits are somewhat broader at the post-fit level. The marginalised bounds are affected by a sizable statistical uncertainty associated with the finite number of replicas. The latter is estimated by Gaussianly fluctuating the χ 2 tot values of each BP around their central values by their bootstrap uncertainty, and keeping only those fluctuations leaving a positive-definite Hessian. The resulting distribution of fit minima, eigenvalues, and eigenvectors are used to estimate these statistical errors, finding in particular that they are larger than the central value as- sociated to the smallest eigenvalue and therefore that a flat direction, mainly in the (a s − a c ) plane, could not be excluded.  Other studies have quantified the constraints on fourfermion operators such as those of Eq. (2), and a compilation of the information from precision LEP data and low-energy measurements was presented in [22]. In Fig. 5 we compare the 90% CL bounds in the (a u , a d ) plane from our work (both pre-fit and post-fit level) with those from both LEP dijet data and parity measurements. We also show the individual bounds from the former since, contrary to the parity data, these are independent on the modeling of the nucleon structure. We find that our precise bound for a u is comparable to previous studies, while those for a d , a s , and a c are less competitive. This encouraging result emphasizes the potential of high-energy collider data for the simultaneous extraction of both PDFs and SMEFT degrees of freedom.
To summarise, in this work we have systematically analysed the interplay between PDF and SMEFT fits, using the HERA structure functions that provide the backbone of all modern PDF extractions as a case study. Our results represent the successful proof-of-concept of a program aiming to exhaustively disentangle potential BSM effects in high-energy measurements that might otherwise be reabsorbed into the PDFs. The next steps in this program will be to extend our study to a global dataset, including LHC data, and to a wider operator basis.
In this Appendix, we provide additional information about the fits based on theory calculations that include SMEFT corrections. We describe how we map the parameter space associated to the SMEFT four-fermion operators of Eq. (2), and in particular present our choice of benchmark points (BPs). We quantify systematically the impact on the PDFs of the SMEFT corrections for each of the BPs, and also study in further detail their mutual correlation.
First of all, in Table III we list the BPs that have been used to reconstruct the Hessian matrix in Eq. (5), both at the pre-fit (fixed input PDFs) and at the post-fit levels. For each BP, we indicate the values of the total and HERA χ 2 , denoted by χ 2 tot and χ 2 hera respectively, both in the case of a common fixed input PDF set ("pre-fit" column) and once the PDFs have been fitted to the theory based on the corresponding BP ("post-fit" column). Here χ 2 hera contains only the contributions from the neutral and charged-current inclusive structure function data, but not that from the heavy quark structure functions. The pre-fit values of the χ 2 are obtained from the central replica of the NNPDF3.1 NNLO DIS-only set, while those of the post-fit case are computed from the average over the N rep = 300 replicas available for each BP. Both at the pre-fit and the post-fit levels, we also indicate the absolute difference between χ 2 tot in the SMEFT and in the SM, that is, The total number of data points in these fits is n dat = 3092, while the number of data points for the HERA inclusive structure function data (both neutral current and charged current) is n dat = 1145. Recall that throughout this work we assume Λ = 1 TeV. The graphical representation of the results for ∆χ 2 smeft based on the total dataset were displayed in Fig. 2. In Fig. 6 we show the same comparison for the values of ∆χ 2 smeft between the pre-fit and post-fit cases, now restricted to the contribution from the HERA structure functions. The impact of the SMEFT effects in the global PDF fit is clearly seen to be driven by the contributions from the HERA structure function data, and in particular for the large-Q 2 bins. As in the case of the total dataset, we observe how the values of ∆χ 2 smeft decrease in the post-fit case as compared to the pre-fit one, due to SMEFT corrections being partially reabsorbed into the fitted PDFs. From Table III it is possible to identify for which BPs the effects of including the SMEFT corrections is more significant, as indicated by the largest variations in ∆χ 2 smeft , and for which these effects can be neglected. The largest variations are found for BP17, defined by a u = −a d = −1.3 and a s = a c = 0, for which ∆χ 2 smeft 121 (95) at the pre-fit (post-fit) level. For some BPs the values of the χ 2 are actually improved once SMEFT corrections are included, such as for BP25, defined with a u = 0.3, a d = 1.2, and a s = a c − 5.0, for which ∆χ 2 smeft varies from +3.5 at the pre-fit level to −9 at the post-fit level. Note, however that some of the BPs for which ∆χ 2 smeft is largest, such as BP3, BP17, or BP42, are excluded at the 90% CL as indicated in Tables I (pre-fit) and II (post-fit).
Next we summarise the main results of the fits listed in Table III at the PDF level. The differences between the PDFs from the fits based on SM theory and those that include SMEFT corrections can be quantified by means of the PDF pull, defined for example in the case of the gluon as follows: and likewise for other flavours. In Eq. (7), g(x, Q 2 ) indicates the fit central value (average over replicas) for a given SMEFT BP and in the SM, while σ g (x, Q 2 ) corresponds to the standard deviation for the fit based on SM theory. In other words, the pull measures the distortion in the PDF central value induced by the SMEFT corrections in units of the 68% CL SM uncertainty. A value of the pull |P | ≥ 1 would indicate an SMEFT-induced distortion which is not contained within the corresponding PDF 68% confidence level uncertainty band.  For each BP, we indicate the values of the total and HERA-only χ 2 , denoted by χ 2 tot and χ 2 hera respectively, both in the case of a common fixed input PDF set ("pre-fit" column) and once the PDF has been fitted to the theory based on the corresponding BP ("post-fit" column). We also indicate the absolute difference between χ 2 tot in the SMEFT and in the SM, ∆χ 2 smeft . Note that throughout this work we are assuming that Λ = 1 TeV.  Table III. The results for other relevant quark combinations, such as the down valence d V , are similar to those for u V and thus they are not shown explicitly here. In the case of the gluon, one finds that the the overall magnitude of the SMEFT-induced distortion varies with the specific BP. The values of the pulls are contained within the PDF uncertainty bands for all BPs considered here, though for some BPs the value of the pull reaches P 0.5 or even larger. As expected, these pulls are close to zero for x ∼ < 10 −3 , since the small-x region is mostly uncorrelated with the large-Q 2 one where the SMEFT corrections are the largest (see Fig. 1).
The situation is somewhat different for the up valence quark distribution, Fig. 8. In this case, as for the other quark distributions, we find that the differences between the fits based on SM and the SMEFT-augmented calculations are smaller than for the gluon, with |P u V | ∼ < 0.4 for all BPs under consideration. This result can be explained by noting that, within a DIS-only fit, the large-x quarks are constrained mostly from the fixed-target structure function data from the SLAC, NMC, and BCDMS experiments. The kinematic coverage of these fixed-target scattering measurements is restricted to the region Q 2 ∼ < 300 GeV 2 , where SMEFT effects are strongly suppressed. Therefore, large-x quark PDFs are fixed by measurements in a region where SMEFT effects are always small.
In Fig. 9 we display the gluon and up valence quark PDFs at Q = 10 GeV comparing the results obtained using SM theory with those obtained with representative SMEFT BPs, normalised to the central value of the former. We have chosen some of the BPs from Table III for which the PDFs differ the most as compared to the SM result, as quantified by the pulls in Figs. 7 and 8. In particular, we show in this comparison BP6, BP17, BP27, and BP42. In these comparison, the PDF uncertainties correspond to the the 68% CL intervals computed from the corresponding N rep = 100 replicas. From Fig. 9 we can observe that these PDF comparisons are consistent with those of the corresponding pulls. In particular, we note how the SMEFT-induced distortions are smaller for the up valence quark than for the gluon, and how in the latter case these effects are clearly visible especially at medium and large-x.
Interestingly, not all the BPs shown in Fig. 9 lead to large values for ∆χ 2 smeft . As one can read from Table III, while at the post-fit level one has ∆χ 2 smeft = 95 and 64 for BP17 and BP42 respectively, instead one obtains rather smaller differences, ∆χ 2 smeft = 22 and 14, for BP6 and BP27. This result highlights that the SMEFT effects can induce non-negligible modifications into the PDFs even in those cases where the overall fit quality is similar to that of the SM case.
Additional information to understand the results of the PDF fits with SMEFT corrections presented in this work is provided by the correlation pattern between the PDFs and the SMEFT coefficients {a q }. These correlation coefficients can be evaluated, for example in the case of the gluon, as follows  7), for the gluon PDF between the fits based on the SM theory and those that include SMEFT corrections. The comparison is performed at Q = 10 GeV for the benchmark points listed in Table III  or a c . In the case of the up quark, the correlation with the SMEFT coefficients is weak (with |ρ| < 0.4 in all cases), consistent with the moderate impact that using SMEFT-augmented calculations have on the corresponding PDF fits. The correlations are more important for the gluon in the intermediate-x region, reaching ρ 0.6, again mirroring the corresponding results at the PDF fit level. The fact that the correlations have a similar shape but opposite overall sign for a u and a d is a consequence of the fact that their contributions at the structure function level have opposite signs. Note that these correlation patterns are specific of the input dataset and choice of SMEFT operator basis, and will in general be rather different if either of the two were to be varied.