Correlating uncertainties in global analyses within SMEFT matters

We investigate the impact of correlations between (theoretical and experimental) uncertainties on multi-experiment, multi-observable analyses within the Standard Model Effective Field Theory (SMEFT). To do so, we perform a model-independent analysis of $t$-channel single top-quark production and top-quark decay data from ATLAS, CMS, CDF and D0. We show quantitatively how the fit changes when different experimental or theoretical correlations are assumed. Scaling down statistical uncertainties according to the luminosities of future colliders with $300 \, {\rm fb}^{-1}$ and higher, we find that this effect becomes a matter of life and death: assuming no correlations returns a fit in agreement with the Standard Model while a 'best guess'-ansatz taking into account correlations would observe new physics. At the same time, modelling the impact of higher order SMEFT-corrections the latter turn out to be a subleading source of uncertainty only.

fits. A first toy study of correlations has been discussed in Ref. [8].
In this paper, we work out how correlations of systematic uncertainties and theory uncertainties between measurements change fit results within the SMEFT framework. We entertain the example of t-channel single top-quark production together with top-quark decay; it is rather compact due to the small number of contributing Wilson coefficients, while still covers all relevant aspects of a global fit with various observables from different experiments. In the recent past, several studies of the top-quark sector of SMEFT have been performed, see, for instance, Refs. [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21].
This paper is organized as follows. In Sec. II we introduce the SMEFT framework and effective couplings relevant to our analysis and describe the computations of SM and BSM contributions to single top-quark production and top-quark decays. In Sec. III we discuss the methodology of our analysis and the experimental input. We consider different scenarios for correlations between measurements and demonstrate how such correlations affect the results of fits to current data.
Furthermore we study the impact of correlations for future high-luminosity experiments. In Sec. IV we conclude.

II. SMEFT APPROACH TO TOP-QUARK PHYSICS
The SMEFT Lagrangian L eff is organized as an expansion in powers of Λ −1 . Higher dimensional operators O In the following we study t-channel single top-quark production cross sections and top-quark decay observables. The following operators contribute at O(Λ −2 ): where φ denotes the Higgs field,φ i = ij φ * j ( 12 = 1), q L is the SU (2) doublet, t R the topquark SU (2) singlet and W I µν and τ I are the field strength tensor and the generators of SU (2) L , respectively. Neglecting contributions proportional to masses m m t , where m t is the top-quark mass, the observables depend on three coefficients: withC i = C i v 2 /Λ 2 and the Higgs vacuum expectation value v = 246 GeV. The additional indices denote the quark generation of the SU (2) L doublets in O (1) qq and O qq .
In Sec. II A and Sec. II B we discuss single top-quark production and top-quark decay, respectively.
A. t-channel single top-quark production We employ the Monte Carlo generator MadGraph5 [22] and the dim6top_LO UFO model [21] to compute SM and BSM contributions to total and differential cross sections of t-channel single top-quark production at parton level at leading order. We validate our results with PYTHIA 8 [23,24] and find good agreement. For all computations we utilize the MSTW20008lo [25] parton distribution function (PDF) set. To reduce the impact of higher order QCD corrections we take into account SM cross sections at NLO. For differential cross sections we apply k-factors to the SM predictions using the NLO predictions presented in the experimental analyses in Refs. [26][27][28][29].
We validate the results by computing the observables at NLO applying MadGraph5 with different PDF sets: MSTW20008nlo [25], CT10nlo [30], NNPDF23_nlo [31]. We find good agreement for all three PDF sets. Total cross sections are computed at NLO using MadGraph5 with the same PDF sets. Renormalization and factorization scales are set to µ R,F = m t with m t = 172.8 GeV.
Scale uncertainties are evaluated by varying renormalization and factorization scales independently between m t /2 ≤ µ R,F ≤ 2m t . We take the maximal variation as the uncertainty. We compute PDF uncertainties with MadGraph5 using the same PDF sets. We take the central value as the estimate and the total 1 σ range, for which we add statistical, PDF uncertainties and scale variation uncertainties in quadrature, as the theory uncertainty.
φq . Contributions from other dimension-six operators are proportional to the mass of the bottom quark m b , and hence neglected. We include BSM contributions at LO and the SM ones at NNLO [32,33].
We count each bin of differential distributions as one observable and include in total 55 measurements of 41 different observables. If differential cross sections are presented in terms of normalized distributions, we reconstruct absolute distributions using total cross sections. We take a constant prior for the parameter interval −1 ≤C i ≤ 1 as default.
We consider both a linear and quadratic fit ansatz for the observables. For the example of total cross sections, the linear one reads where σ SM denotes the SM contribution and σ i are the LO interference terms at O(1/Λ 2 ) between SM and BSM. Specifically, in the linear ansatz, the quadratic BSM terms following from squaring amplitudes linear in the Wilson coefficients, are omitted, as they are formally of higher order, O(Λ −4 ), even though they are induced by dimension six operators.
The quadratic ansatz reads where the purely BSM contributions from dimension six operators σ ij contributing at O(1/Λ 4 ) are kept. To study the performance of the SMEFT-fit in view of the power corrections v 2 /Λ 2 , we compare results in the linear, the quadratic approximation and in a third EFT-implementation ('linear+δ EFT ') based on the linear ansatz where we add an additional relative systematic theory uncertainty δ EFT ∼ v 2 /(1 TeV) 2 to each observable to model higher order effects.
In Sec. III A we provide our set-up for correlated uncertainties. Fit results for present and hypothetical future data are presented in Sec. III B and Sec. III C, respectively. In Sec. III D we compare our findings assuming no correlations to results in the literature.

A. Uncertainty set-ups
We consider three different types of uncertainties: Statistical uncertainties, systematic uncertainties and theory uncertainties. In the statistical analysis with EFTfitter the uncertainties of all measurements are assumed to be Gaussian distributed. As described in Ref. [34], correlations are taken into account for all types of uncertainties (here: statistical, systematic and theory) by calculating the total covariance matrix M as the sum of the individual covariance matrices cov where x i denotes the measurements, σ ij are the correlation coefficients and the sum is over all types of uncertainties. Correlated statistical uncertainties arise if different observables are extracted from the same data set. Corresponding correlation matrices are mostly known from the experimental analyses [26,27,37,38,43,44], and included in our analysis. In contrast, almost no information about the correlation of systematic uncertainties or theory uncertainties is provided. To study their impact on the results of the fit we choose a simple parametrization of the correlation matrices. In the case of systematic uncertainties, correlations between measurements by the same experiment at the same energy are set to ρ sys , since such systematic uncertainties are expected to have the same source. Moreover, we expect the uncertainties of observables measured by the same experiment at different energies to be correlated less, and therefore set these entries to ρ sys /2. In contrast, theory uncertainties are not expected to depend on the experiment, but on the energy of the process. Therefore, correlations between measurements at the same energy are set to ρ th . Observables measured at different energies are assumed to be correlated with a coefficient ρ th /2. In the fit with the linear+δ EFT ansatz correlations between observables of the same process are set to ρ EFT = 0.9. In the linear (4) and quadratic (5) ansatz these correlations are omitted together with the corresponding uncertainties.
It should be noted that F 0 and F L are always anti-correlated in our set-up since they are required to add up to 1 − F R . In the SM, In the following analysis we demonstrate the impact of the correlation parameters on the fit results by varying ρ sys and ρ th independently within the interval [0, 1], since positive values for the correlations are expected. We also explored the possibility of negative values but found that in this case the covariance matrix is no longer positive semi-definite. We present results for two benchmark scenarios: The 'no correlation' scenario, which has been adopted in previous studies [5,19,20], where we neglect all unknown correlations and the 'best guess' scenario with strong correlations [48] ρ sys = 0.9 , ρ th = 0.9 , ('best guess') .
The correlation matrices for the data given in Tab. I are 55 × 55 dimensional and too large to be given here explicitly. Instead, we illustrate our parametrization with a simplified one. Suppose a dataset with five measurements: the total cross sections of single top-quark production σ(tq) A 7 and single antitop-quark production σ(tq) A 7 performed by ATLAS at 7 TeV, the total cross section σ(tq) A 8 and σ(tq) C 8 measured at 8 TeV by ATLAS and CMS, respectively, and the top-quark decay width Γ t . In this example, our parametrization of the correlation matrix of systematic uncertainties while the one of theory uncertainties is written as The additional matrix of the δ EFT uncertainties included in the linear+δ EFT ansatz reads with ρ EFT = 0.9 in both the 'no correlation' and 'best guess' scenario.

B. Fit to data
We present results from fits to the data given in Tab only slightly affected, we find changes in the 'best guess' scenario compared to the 'no correlation' scenario for bothC (3) φq andC qq . In the case ofC qq the 95 % interval shrinks by a factor of up to 2 while the central value also changes by a factor of up to 1.8, resulting in deviations of up to 4.5 σ.
Similarly, the central value ofC (3) φq is shifted away from the SM and the 95 % interval grows by a factor of 2. This gives rise to deviations from the SM of up to 4.4 σ.
To detail the impact of systematic and theory uncertainties on the constraints we perform fits in which we vary ρ sys and ρ th independently. In Fig. 2 we give the central value ofC and to the right of the line deviate from the SM by more than 2 σ. We find that correlations of systematic uncertainties have a stronger impact on the constraints than correlations of theory uncertainties since even in the case ρ th = 0 we can still find deviations of more than 2 σ from the SM while we do not find such deviations for ρ sys = 0.
Correlations affect constraints onC qq and onC (3) φq , whileC tW remains almost unchanged. This is due to the different datasets driving the constraints:C tW is strongly constrained by the helicity fractions, which have weaker correlations among each other and smaller uncertainties than single top production data. In contrast,C qq andC (3) φq are constrained by the differential and total cross sections. These datasets can be strongly correlated, such that the corresponding constraints on the Wilson coefficients can change significantly with the correlation set-up.
In Fig. 3  To validate the stability of our fit, we vary the non-zero off-diagonal entries in the 'best guess' correlation matrices of systematic and theory uncertainties by adding uniformly distributed random numbers u ∈ [−0.05, 0.05] to the entries. Each element is varied individually while keeping the correlation matrices symmetric. Using the randomized correlation matrices, we perform 3000 marginalized fits to the data given in Tab. I. In Fig. 4 we give histograms for the central value (left) and for the size of the 95 % interval (right) ofC (3) φq in the linear EFT-implementations for correlation parameters varied randomly around the 'best guess' scenario (8). The black lines denote the results from the 'best guess' scenario. Compared to the 'best guess' scenario, the distribution of the central values is shifted toward more negative values and is slightly asymmetric, favoring values further away from the SM. The distribution of the size of the 95 % interval is shifted to smaller values and shows an asymmetry, favoring smaller intervals. As in Fig. 3, we observe a smooth and stable dependence on the correlation parameters. Similar results are obtained for the quadratic and to data in Tab. I with statistical uncertainties scaled to 300 fb −1 . Dots and lines denote the central value and the smallest 95 % interval, respectively. We find that increasing the luminosity from up to 20 fb −1 for the data in Tab. I to 300 fb −1 improves the constraints on the coefficients in both correlation scenarios.
In contrast, increasing the luminosity further to 3000 fb −1 barely improves the constraints (not shown) due to the dominating systematic and theory uncertainties: In the 'no correlation' scenario results for 300 fb −1 and 3000 fb −1 are the same up to percent level for all coefficients. In the 'best guess' scenario the results change by up to 5 % forC to the fit to current data correlations of systematic uncertainties have a stronger impact on the constraints than correlations of theory uncertainties.

D. Comparison to literature
As a consistency check we compare our results to a recent global SMEFT analysis [7], which provides 95 % confidence level intervals from a fit to single top-quark production and top-quark decay data for the coefficients in Eq. (3). The dataset used in Ref. [7] is very similar to ours given in Tab. I, except for differential cross sections, not taken into account in Ref. [7]. To allow for a comparison of results we repeat our fits for the different fit models, linear, linear+δ EFT , and quadratic, defined in Sec. III in the 'no correlation' scenario to the data in Tab. I, excluding differential cross sections. Even though the smallest 95 % intervals in Bayesian statistics differ from confidence intervals in frequentist statistics, used in Ref. [7], we expect them to give very similar results.
Operators 95 % CL [7] Linear Linear+δ EFT Quadratic C  In Tab. II we give the 95 % confidence levels from Ref. [7] together with the smallest 95 % intervals obtained in our fits. The results are very similar for all three BSM coefficients in the three different EFT-implementations. Small differences can be expected from the additional NLO QCD corrections to BSM contributions which are included in Ref. [7] and from inflated BSM contributions in our linear+δ EFT implementation.

IV. CONCLUSIONS
We studied the impact of correlations between (systematic and theory) uncertainties on multiexperiment, multi-observable analyses within SMEFT. Specifically, we performed a first quantitative study of such correlations entertaining the example of t-channel single top-quark production and top-quark decay. This data set allowed us to include 55 measurements from ATLAS, CMS, CDF and DO, given in Tab. I, in an analysis to constrain three Wilson coefficients (3). We considered different scenarios for theoretical and systematical uncertainties by varying two parameters ρ sys and ρ th in the correlation matrices based on simplifying assumptions. We highlighted two scenarios: The 'no correlation' scenario Eq. (7), which has been utilized in previous studies and the 'best guess' scenario Eq. (8), which incorporates additional correlations between measurements.
Not unexpectedly, correlations change the constraints on the Wilson coefficients significantly.
Without correlations no deviations from the SM are found. In the case of strong correlations the SM prediction is not included anymore in the marginalized smallest 95 % intervals of bothC (3) φq and C qq , see Fig. 1. These deviations can be up to 4.5 σ forC (3) φq and 4.6 σ forC qq . On the other hand, different models (linear, quadratic, linear+δ EFT ) for EFT-systematics from higher order corrections leave these findings -except forC qq , where this can be expected -qualitatively untouched.
Correlations become even more crucial for future high-luminosity experiments where the importance of systematic and theory uncertainties is amplified. Assuming central values fixed, the SMEFT-fit leads to significant deviations from the SM at 9.0 σ (linear), 8.9 σ (quadratic) and 9.3 σ (linear+δ EFT ) inC (3) φq in the 'best guess' scenario at 300 fb −1 , see Fig. 5. To benefit from improvements in statistics beyond 300 fb −1 requires improvements in experimental systematics and theoretical predictions.
Our analysis highlights the importance of correlations in global fits, especially for high-luminosity experiments. We suggest to consider different correlation scenarios and to take the corresponding variation into account when presenting results of global fits. At the same time, studies along the lines of Ref. [48] are encouraged to provide SMEFT-analyses with the requisite information about correlations of systematic and theory uncertainties. To conclude: "Correlating uncertainties in global analyses within SMEFT matters in the future even more."