Neutral-Current Electroweak Physics and SMEFT Studies at the EIC

We study the potential for precision electroweak (EW) measurements and beyond-the-Standard Model (BSM) searches using cross-section asymmetries in neutral-current (NC) deep inelastic scattering at the electron-ion collider (EIC). Our analysis uses a complete and realistic accounting of systematic errors from both theory and experiment and considers the potential of both proton and deuteron beams for a wide range of energies and luminosities. We also consider what can be learned from a possible future positron beam and a potential ten-fold luminosity upgrade of the EIC beyond its initial decade of running. We use the SM effective field theory (SMEFT) framework to parameterize BSM effects and focus on semi-leptonic four-fermion operators, whereas for our precision EW study, we determine how well the EIC can measure the weak mixing angle. New features of our study include the use of an up-to-date detector design of EIC Comprehensive Chromodynamics Experiment (ECCE) and accurate running conditions of the EIC, the simultaneous fitting of beam polarization uncertainties and Wilson coefficients to improve the sensitivity to SMEFT operators, and the inclusion of the weak mixing angle running in our fit template. We find that the EIC can probe BSM operators at scales competitive with and in many cases exceeding LHC Drell-Yan bounds while simultaneously not suffering from degeneracies between Wilson coefficients.

The Standard Model (SM) of particle physics currently describes all known laboratory phenomena. All particles predicted by the SM have now been found after the discovery of the Higgs boson at the Large Hadron Collider (LHC). No new particles beyond those present in the SM have been discovered and no appreciable deviation from SM predictions has been conclusively observed. Despite the enormous success of this theory, it contains numerous shortcomings. It does not contain an explanation of the dark matter observed in the universe or of the baryonantibaryon asymmetry and it does not describe neutrino masses. It additionally suffers from several aesthetic issues, such as the hierarchy problem and an extreme hierarchy of fermion Yukawa couplings. Even the sectors of the theory that have been experimentally successful still contain unsatisfying and poorly understood features. For example, the exact composition of the proton spin in terms of the spin and orbital angular momentum of its constituent quarks and gluons is still poorly known.
Numerous experimental programs that attempt to address these residual issues in our understanding of Nature are either running or under design. Our focus in this manuscript will be on the Electron-Ion Collider (EIC) to be built at Brookhaven National Laboratory in Upton, New York. The EIC will be a particle accelerator that collides electrons with protons and nuclei in the intermediate-energy range between fixed-target scattering facilities and high-energy colliders. It will provide luminosity orders of magnitude higher than HERA, the only electron-proton collider operated to date. It will also be the first lepton-ion collider with the ability to polarize both the electron and the proton (ion) beams and the first collider with a fast spin-flip capacity. These unique design features will allow direct extraction of parity-violating (PV) asymmetries in the electroweak neutral-current (NC) scattering cross section associated with either the electron, A . Experimental uncertainties from effects such as luminosity measurement and detector acceptance or efficiency will be substantially reduced due to these capabilities.
Although the EIC was designed primarily to explore outstanding issues in QCD such as the proton spin mentioned above, it also has strong potential to probe several aspects of precision electroweak (EW) and beyond-the-SM (BSM) physics. It can measure the value of the weak mixing angle over a wide range of momentum transfer complementary to Z-pole measurements and low-energy determinations. The possibility of polarizing both electron and proton/ion beams gives it unique handles on BSM physics. Our goal in this manuscript is to provide a detailed accounting of the EW and BSM potential of the EIC with a realistic simulation of anticipated experimental uncertainties. We explore the use of the asymmetries A . In addition to determining the BSM reach of PV observables, we consider the reach of the lepton-charge asymmetry A (p(D)) LC at the EIC for the first time, assuming a positron beam will become available in the future.
Since no new particles beyond the SM have so far been discovered, we adopt the Standard Model Effective Field Theory (SMEFT) for our BSM studies (for a review of the SMEFT, see Ref. [1]). The SMEFT contains higherdimensional operators formed by using SM fields, assuming all new physics is heavier than both SM states and the accessible collider energy. The leading dimension-6 operator basis of SMEFT for on-shell fields has been completely classified (there is a dimension-5 operator that violates lepton number, which we do not consider here) [2][3][4]. We find that the EIC can probe the full spectrum of SMEFT operators to the few-TeV level or beyond. The wide variety of observables possible at the EIC, which include several asymmetries with either proton or ion beams, ensure that no flat directions remain in the Wilson coefficient parameter space, unlike at the LHC in the neutral-current Drell-Yan process [5][6][7]. Our analysis of the determination of the weak mixing angle, assuming a realistic annual luminosity and accounting for experimental and theoretical uncertainties to the best level that can be reached at pre-EIC running stage, finds good precision for this fundamental SM parameter in a kinematic region not explored before. The precision will continue to improve as data are accumulated from decades-long running of the EIC.
Our paper is organized as follows. In Section II, first we provide a complete description of deep inelastic scattering (DIS) that includes both SM contributions and the SMEFT extensions. The DIS cross sections that account for both electron and hadron polarizations are provided in both structure-function and parton-model languages. We follow this theoretical framework by presenting a basic strategy to measure different polarization components of the cross sections and to form PV asymmetries at the EIC. Measurement of the lepton-charge (LC) asymmetry is also discussed. In Section III, we present data simulation based on the design of the ECCE Detector (recently endorsed as the reference design for EIC detector 1 by the EIC Detector Proposal Advisory Panel [8]) using a fast-smearing method and eventselection criteria, followed by projections of statistical precision for PV and LC asymmetries based on the planned annual luminosity of the EIC. Generation of pseudodata, as well as the uncertainty matrix, are presented in Section IV, followed by extractions of the EW mixing angle in Section V. In Section VI, we provide an extensive description of our SMEFT analysis framework, with representative results on the fits of single and two Wilson coefficients given in Section VII. We also show an example fit in which six Wilson coefficients are turned on simultaneously, in order to demonstrate that EIC data is capable of removing all degeneracies in the semi-leptonic four-fermion operator parameter space. We conclude in Section VIII. In Appendix A, we present novel analysis methods to simultaneously fit PV asymmetries and the beam polarization or LC asymmetries and the luminosity difference between e + and e − runs. A complete collection of all the SMEFT fit results of single and two Wilson coefficients from this study is given in Appendix B.

A. Deep Inelastic Scattering and the SMEFT Formalism
In this section, we give a brief overview of the formalism of DIS and the SMEFT. In particular, we generalize the SM DIS cross-section and asymmetry formulae to include contributions from SMEFT operators, which encode new physics at an energy level Λ that lies well beyond the electroweak scale. We denote a lepton scattering off a nucleus as: where stands for an electron or positron, the hadron H stands for either the proton (p) or the deuteron (D), and X denotes the final-state hadronic system. The four-momenta of the initial and final leptons and the initial hadron are denoted as k, k , and P , respectively. Using the momenta of the initial-and final-state leptons and the initial-state hadron, one can define the following Lorentz-invariant kinematic variables: where s is the center-of-mass energy squared, Q 2 is the negative of the lepton four-momentum transfer squared, the Bjorken-x variable is the longitudinal hadron momentum fraction carried by the struck parton, the inelasticity parameter y gives the fractional energy loss of the lepton in the hadron rest frame, and W gives the invariant mass of the final-state hadronic system X. The kinematic variables x, y, s, and Q 2 are related to each other via Q 2 = xy(s − M 2 ), where M is the mass of the proton or deuteron. The diagrams in Fig. 1 show the partonic tree-level processes that contribute to Eq. (1). These are the contributions to the total tree-level amplitude from single-photon exchange, single-Z-boson exchange, and the SMEFT contact interactions. The SMEFT Lagrangian that describes these contact interactions has the form: where the summation index r runs over the set of dimension-6 SMEFT operators and the ellipsis denotes SMEFT operators of mass-dimension greater than 6. We restrict our analysis to include only the effects of dimension-6 SMEFT operators since the higher-dimensional operators are formally suppressed by additional powers of E 2 /Λ 2 , where E is the typical energy scale of the scattering process. Although these effects can be important for Drell-Yan production at the LHC [9,10], the low energy of the EIC renders them negligible in this analysis. O r denotes the r th dimension-6 operator and C r is the corresponding (dimensionless) Wilson coefficient arising from integrating out the new-physics degrees of freedom at the scale Λ. These Wilson coefficients can be constrained through a comparison of SMEFT predictions with precision measurements of various processes studied in a variety of experiments across a wide range of energy scales. The subset of dimension-6 operators that we consider in our analysis of DIS is given in Table I. We note that there are additional SMEFT operators but they are known to be far better bounded through other data sets such as precision Z-pole observables [11][12][13] and we neglect them here. The above assumptions leave us with the seven Wilson coefficients associated with the listed operators which enter the predictions for DIS cross sections and asymmetries.
As seen in Table I, the SMEFT operators O r are expressed in terms of the basis of SM fields before electroweak symmetry breaking. For the purposes of DIS phenomenology below the electroweak scale, it is useful to rewrite these SMEFT operators in the vector and axial-vector basis using Dirac fields that describe the massive electrons (e) and quarks (q f ) after electroweak symmetry breaking, which is a customary parameterization (see e.g. [14]): where the specific values of the vector and axial-vector couplings-c e,q V k and c e,q A k , respectively-for the r th SMEFT operator follow from the corresponding chiral and flavor structure of the SMEFT operators. The coefficientsC r are related to the C r by an overall factor and can be fixed by comparing Eqs. (7) and (8). There is freedom to always redefine theC r by absorbing an overall factor into the couplings c e,q Vr , c e,q Ar . We specify in Table I the exact definitions that we use. These couplings are analogous to the vector and axial-vector couplings, g e,q V and g e,q A , of the Z-boson but are instead generated from integrating out UV physics associated with the scale Λ.
As seen in Fig. 1, the total tree-level amplitude can be decomposed into three contributions: where M γ , M Z , and M r denote the contributions from single-photon exchange, single-Z-boson exchange, and the SMEFT operators, respectively. In particular, M r = i M i , where the summation index i runs over the amplitudes arising from the SMEFT operators listed in Table I. Up to leading order in the SMEFT power counting, where only dimension-6 SMEFT operators that scale as 1/Λ 2 are kept, the total amplitude squared can be written as: where These denote the amplitudes of the single-photon exchange, single-Z-boson exchange, the interference between the single-photon and single-Z-boson exhange, the interference between the single-photon exchange and the SMEFT, and the interference between the single-Z-boson exchange and SMEFT, respectively. Here, we ignore the |M r | 2 contribution since it scales as 1/Λ 4 , formally the same size as contributions from dimension-8 SMEFT operators interfering with the SM.
For the hadron-level cross sections and asymmetries, these different contributions will give rise to corresponding structure functions. In particular, in addition to the usual structure functions encountered in the SM DIS, new structure functions corresponding to SMEFT contributions arise. Thus, including the SMEFT contributions, the DIS differential cross section takes the general form: where α is the electromagnetic fine structure constant and L γ,γZ,Z,γr,Zr µν and W γ,γZ,Z,γr,Zr µν are the leptonic and hadronic tensors, respectively. The first three terms on the right-hand side (RHS) correspond to the SM contributions from M γγ , 2M γZ , and M ZZ , respectively, and the last two sets of terms correspond to the contributions from the SMEFT operators, i.e. 2M γr and 2M Zr , respectively. For completeness, below we collect some useful results to make the form of the cross section explicit. The dimensionless coefficients η γ,γZ,Z , ξ γr , and ξ Zr are given by: where G F = 1.1663787(6) × 10 −5 GeV −2 is the Fermi constant and M Z = 91.1876 ± 0.0021 GeV [14] is the mass of the Z boson. The leptonic tensors in Eq. (11) are: where λ e = ±1 denotes the lepton helicity. For positrons, one flips the sign of all the g e A and c e Ar terms and the overall sign of L γZ and L γr above. Using these identities for the leptonic tensors, Eq. (11) can be written more explicitly as: Cr OrCr c e Vr c e Ar c u Vr c u Ar c d,s Vr c d,s Oqe = (QLγ µ QL)(ēRγµeR) Cqe/4 1 -1 1 1 1 1 The coefficients c f V,A give the chiral structure of each operator.
Based on the general Lorentz-tensor structure, the available four-momenta, and the nucleus spin vector, S µ , numerous hadronic tensors are parameterized in terms of structure functions as whereP µ ≡ P µ − q µ (P · q)/q 2 . The index j denotes the possibilities {γ, γZ, Z, γr, Zr} and F j 1,2,3 and g j 1,4,5 denote various unpolarized and polarized nuclear structure functions, respectively. We omit two additional possible Lorentz structures in the hadronic tensor, typically denoted as the polarized structure functions g 2 and g 3 , since these terms give a contribution to the cross section that is suppressed by M 2 /Q 2 when contracted with the leptonic tensor. Therefore, we do not consider the structure functions g 2,3 in the rest of our analysis. The nucleus spin vector S µ satisfies the constraints S 2 = −M 2 and S · P = 0. For longitudinal polarization, it takes the canonical form S µ = λ H (|p|, E p |p| ), where λ H = ±1 is the nucleon helicity and P µ = (E, p) is the nucleon four-momentum.
Based on the structure of the cross section in Eq. (14), in conjunction with the form of the hadronic tensor in Eq. (15), it becomes useful to define the following combinations of structure functions that also include the SMEFT contributions: where the SM contributions are given by the commonly known NC structure functions and the SMEFT contributions are given by: The parton-model expressions for the SM structure functions are summarized below. We also provide the corresponding expressions for the structure functions arising from the interference of the SM with the SMEFT operators: where q f (x, Q 2 ) and ∆q f (x, Q 2 ) are unpolarized and polarized parton distribution functions (PDFs) of quark flavor f , respectively, and Q f denotes the electric charge in units of the proton charge e. In the parton model, at leading order (LO), one has for the structure functions the Callan-Gross relations F i 2 = 2xF i 1 and g i 4 = 2xg i 5 for i = γ, γZ, Z, γr, Zr. For an ion beam (or nuclear target), the neutron PDFs can be related to the proton PDFs by assuming isospin symmetry for the valence quarks: while the charm and strange sea quark PDFs are assumed to be identical for the proton and the neutron: For the deuteron, an isoscalar bound state of a proton and a neutron, the PDFs can be constructed from the proton and neutron PDFs as: for quark flavor f . In terms of the generalized structure functions in Eq. (16), which include dependence on the electron helicity, λ e , as seen in Eqs. (17) and (18), one can write the cross section for given electron and nucleon helicities, including SMEFT operator contributions, as: where we ignore the electron mass and all target-mass correction terms that are proportional to M 2 /Q 2 .
To connect to experimentally measured observables, it is convenient to write the scattering cross section of Eq. (23) as the sum of four components that depend on the spin direction of the initial electron and hadron: dσ 0 , dσ e , dσ H , and dσ eH , where each dσ represents the differential cross section as d 2 σ/(dx dy). The quantity dσ 0 is the unpolarized cross section, dσ e and dσ H denote the cross-section differences between initial electron and hadron states of opposite helicity, respectively, and dσ eH is the cross-section difference between initial electron and hadron states with the same and opposite helicities defined in the center-of-mass frame. These quantities can be formed by using Eq. (23) as: and can be computed in conjunction with Eqs. (16), (17), and (18). The SM contributions to the DIS cross sections with the target-mass terms omitted are: The SMEFT contributions are: If a positron beam becomes available at the EIC, one can measure both e + H and e − H cross sections and study the differences. Neglecting target-mass terms again and writing the SM and SMEFT contributions all together, we have: , (1 − y)c e Ar (ξ γr g γr 4 + 2g e V ξ Zr g Zr 4 ) + xy 2 c e Ar (ξ γr g γr 5 + 2g e V ξ Zr g Zr 5 ) .
In this study, we focus on measurements of both parity-violating and lepton-charge asymmetries. The parityviolating asymmetry can be formed either by comparing right-handed and left-handed electron scattering from unpo-larized hadrons, referred to as the "unpolarized PV asymmetry": or by comparing unpolarized electron scattering off right-handed and left-handed hadrons, referred to as the"polarized PV asymmetry": If a positron beam becomes available in the future, the lepton-charge asymmetry, defined as the unpolarized DIS cross-section asymmetry between electron and positron beams: will provide additional constraints on SMEFT interactions. On the other hand, the double-spin asymmetry, A PV ≡ dσ eH /dσ 0 , is the primary observable to study the nucleon spin structure but is not within the scope of this work. Similarly, a complete list of lepton-charge asymmetries that includes lepton-polarization dependence can be found in [15], but they provide similar constraints to SM and SMEFT studies as the unpolarized asymmetry defined in Eq. 30 and are not discussed in this work.

B. Measurement of Parity-Violating Asymmetries at the EIC
In DIS experiments utilizing an electron beam of polarization P e and a hadron beam of polarization P H , the measured differential cross section is: where P e and P H have the same sign as the respective beam helicities, λ e and λ H , and can take the values −1 P e , P H 1. The various cross-section components in Eq. (31) are given in Eqs. (25). The PVDIS asymmetry can be formed by flipping the spin direction of either the electron beam or the ion beam. For the EIC, beams of opposite polarizations will be injected into the storage rings alternately, and thus each of the signs of both electron and ion polarizations is flipped periodically on a short time scale. This is in contrast to HERA, where data were taken with positive then negative electron polarization, with such long time intervals in between that runs with opposite electron polarizations are essentially two independent experiments.
We express the measured DIS event counts during a certain beam-helicity state as: where L ij stands for the integrated luminosity, and P ij e and P ij H are the electron and the proton (or ion) beam polarizations during the corresponding helicity bunch ij. The superscripts ij = ++, +−, −+, −− represent the electron and the proton helicity states with their time sequence depending on the helicity pattern of the beam injection. The a det factor represents the detector phase space, acceptance and efficiency. In the simplest case, if we assume both beam polarizations, the luminosity, and detector efficiency and acceptance do not vary with time, then: where we define the experimentally measured cross section by dσ ij ≡ N ij /L ij /a det . The PVDIS asymmetry due to the electron spin-flip can be extracted from data by taking the ratio of the cross sections. Because spin-flips of both electron and hadron beams will be carried out at very short time-scale, the factor a det can be assumed constant and cancels out when forming the asymmetry, and we can extract the asymmetry from experimentally measured yields, defined by Y ij ≡ N ij /L ij , as: and that due to proton (ion) spin flip can be similarly extracted as: The design of the EIC requires that the point-to-point luminosity uncertainty be at 10 −4 level. Therefore, the dominant experimental uncertainty would come from electron and proton (ion) polarimetry for A LC , we can reverse the polarity of the magnet to minimize the systematic uncertainty due to differences in e − and e + detection. In this case, the main experimental systematic uncertainty will come from the luminosity difference between e − and e + runs, which is assumed to be 2% (relative in luminosity, absolute in A The EIC Comprehensive Chromodynamics Experiment (ECCE) detector concept [16] addresses the full EIC science mission as described in the EIC community White Paper [17] and the 2018 National Academies of Science (NAS) Report [18]. It is simultaneously fully capable, low-risk, and cost-effective. ECCE strategically repurposes select components of existing experimental equipment to maximize its overall capabilities within the envelope of planned resources. For example, the central barrel of the detector incorporates the storied 1.4-T BaBar superconducting solenoid and the sPHENIX barrel hadronic calorimeter, currently under construction.
For EW NC physics studied in this work, we focus on the detection and identification of inclusive scattered electrons, provided by ECCE's tracking system [19] combined with electromagnetic calorimetry [20] in a nearly hermetic coverage. ECCE features a hybrid tracking detector design using three state-of-the-art technologies to achieve high-precision primary and decay-vertex determination, fine momentum-tracking, and distance-of-closest-approach resolution in the region |η| 3.5 with full azimuth coverage. The ECCE tracking detector consists of the Monolithic Active Pixel Sensor (MAPS)-based silicon vertex/tracking subsystem, the µRWELL tracking subsystem, and the AC-LGAD outer tracker, which also serves as the time-of-flight detector, all optimized by artificial intelligence. For the electromagnetic calorimeter, the system employed by ECCE consists of the PbWO 4 -based Electron Endcap EM Calorimeter (EEMC) for the region −3.7 < η < −1.8, the SciGlass-based barrel ECal for the region −1.7 < η < 1.3, and the Pb-Scintillator shashlik-type forward ECal (FEMC, hadron beam direction) that covers roughly 1.3 < η < 4.
For the inclusive DIS kinematics determination, we use single-electron simulations in the full detector to study the measurement of the electron momentum and trajectory, and characterize the difference between detected and true values as smearing in the electron momentum and polar and azimuthal angles. The smearing can then be applied to simulated events without involving the full detector. This is referred to as fast-smearing and is the simulation method adopted here that yield all physics projections provided in this work. On the other hand, other methods that can be used to identify DIS kinematics, such as detecting all hadrons in the final state, or detecting both the scattered electron and all hadrons, are not investigated here. Similarly, the use of the electromagnetic calorimeter can improve track identification in part of the phase space, but is not included in this work.

B. Simulation with Fast Smearing
We use the Djangoh event generator [21] (version 4.6.16 [22]) that includes full electromagnetic and electroweak radiative effects to generate Monte-Carlo (MC) events for each of the four beam-energy and two beam-type combinations: 18 × 275(137), 10 × 275(137), 10 × 100, and 5 × 100 GeV for ep (eD) collisions, respectively. For the deuterium ion beam, the energy specified is per nucleon. The fast-smearing method is applied to inclusive electron events in the Djangoh output, and the physics cross section and parity-violating asymmetries are calculated event-by-event using a modified user routine of Djangoh. The number of scattered DIS electrons is then calculated using the cross-section information and the expected integrated luminosity after correcting for bin migration.
The detector fast-smearing is obtained from a single-electron gun simulation. Resolution spectra are determined for 57 evenly-spaced bins for the pseudo-rapidity range η = (−3.5625, 3.5625) and 1 GeV-wide bins in the transverse momentum, p T . For each Djangoh-simulated event, smearing in the electron momentum, p, and polar and azimuthal angles θ and φ are randomly picked from the corresponding spectrum and applied to the event, which are used to determine the detected kinematics of the event. While the smearing spectra are not exactly Gaussian-shaped, they are fitted with a Gaussian function. The fitted root-mean-square (RMS) values extracted for illustration purposes are displayed in Fig. 2.

EIC/ECCE preliminary
RMS values for fast-smearing spectra obtained from single electron-gun simulation of July 2021 concept of ECCE. The unit for σ θ and σ φ is radians.
Using the fast-smearing method, we generate 20 M total MC events for each of the beam-energy combinations.Of these 20 M, 10 M events are generated to study the kinematic coverage over the full phase space. The remaining 10 M events are generated with Q 2 min = 50 GeV 2 for which DIS events have the most impact on the extraction of the weak mixing angle. The drawback of the fast-smearing method is that no selection of the hadronic state is implemented. Methods utilizing hadronic final states such as the double-angle method may provide better DIS event identification for certain kinematic range and thus improve precision of the analysis.
Bin migration of inclusive scattering electrons due to internal and external radiative effects is studied with fastsmearing simulation and treated using the "R matrix" unfolding method [23]. Background reactions are studied using the hadronic final state generated by Djangoh (with Q 2 min = 1.0 GeV 2 ), and another Monte-Carlo simulation of photoproduction events are generated by Pythia (version 6.428, with Q 2 min = 0). All events are passed through the full ECCE simulation. We find that the highest background events occur at high y values. These events are rejected at the event-selection stage; see the next section.
We have also studied how our results change if a simple "theory-only" simulation without a detailed detector simulation is performed. We find two major differences with respect to the current analysis: • As mentioned in the next section, we use the inelasticity constraint 0.1 < y < 0.9 in our current simulation.
We find that the regions 0.1 < y < 0.2 and 0.8 < y < 0.9 are not reliably modeled without a detailed detector simulation. Our theory-only simulation cannot accurately reproduce the expected event counts in this region due to missing detector response effects. We therefore must remove these regions, leading to an effective reduction of statistics for the theory-only simulation.
• Secondly, in the 0.2 < y < 0.8 region considered, the total error is relatively 10 to 30% more optimistic in each bin compared to the full detector simulation, with the 30% differences occurring near the boundaries of the y region.
The net result of these two competing effects is that theory-only bounds are up to 10% more optimistic than those found with a full detector simulation.

C. Event Selection
For the 20 M fast-smearing events, event-selection criteria are applied to choose DIS events (Q 2 det > 1.0 GeV 2 ) in order to avoid regions with severe bin migration and unfolding uncertainty (y det > 0.1), to avoid regions with high photoproduction background (y det < 0.90), to restrict events in the main acceptance of the ECCE detector where the fast-smearing method is applicable (η det > −3.5 and η det < 3.5625), and to ensure high purity of electron samples (E > 2.0 GeV). Here, the subscript "det" implies the variables are calculated using the detected information of the electron. The projected values and statistical uncertainties for A

D. Integrated Luminosity
To account for realistic running conditions, the annual luminosity-the "high-divergence configuration" value as shown in Table 10.1 of the EIC Yellow Report (YR) [24], multiplied by 10 7 s-are used. These values are shown in Table II and will be referred to as "Nominal Luminosity (NL)" hereafter. As a comparison with the weak mixing angle extraction presented in the YR, we also carry out projections for 100 fb −1 18 × 275 GeV ep and 10 fb −1 18 × 137 GeV eD collisions as the "YR reference point". We abbreviate the ep pseudodata sets as P1, P2, P3, P4, and P5 and the eD pseudodata sets as D1, D2, D3, D4, and D5; see Table II. The YR reference point is denoted by P6. Simulated pseudodata sets with polarized hadrons are indicated as ∆D1-5 and ∆P1-6, while positron data sets are referred to as LD1-5 and LP1-6 (with "L" for lepton charge).
As an exercise, we consider the additional statistical power that could be obtained by a hypothetical future highluminosity upgrade to the EIC (HL-EIC) that delivers a ten-fold increase in the integrated luminosity (10× higher than those in Table II) for these measurements. As the EIC is not yet built, there is no technical basis to assume that such an upgrade is possible. We choose the factor of 10× luminosity increase to explore the sensitivity of the measurements we study in this paper, without making a comment as to the feasibility of such an upgrade. These projections will be denoted with an "High Luminosity (HL)" label hereafter.

E. Statistical Uncertainty Projection for Parity-Violating Asymmetries
For a given value of integrated luminosity, the statistical uncertainty of an asymmetry measurement is: where N is the total number of events detected, assumed to be approximately equally divided between the two scattering types-between left-and right-handed electron beam, between left-and right-handed proton (ion) beam, or between positron and electron runs. The unfolding process increases the statistical precision only slightly for the region where the relative statistical uncertainty on the asymmetry is most precise. If the asymmetry originates from polarization (as for the case of PV asymmetries), one must correct for the beam polarization: For A (e) PV projections, an electron beam polarization of P e = 80% with relative 1% systematic uncertainty from the electron polarimetry is assumed. Similarly, for A (H) PV projections, a proton (ion) beam polarization of P H = 70% with relative 2% systematic uncertainty from the proton (ion) polarimetry is used. An illustration of the relative precision of PV asymmetries is provided in Figs

F. Statistical and QED Uncertainty Projection for Lepton-Charge Asymmetries
As described in Section II C, to measure the lepton-charge asymmetry A (H) LC , one can reverse the polarity of the magnet to minimize the systematic uncertainty due to differences in e − and e + detection. In this case, the main experimental systematic uncertainty would come from the luminosity difference between e − and e + runs, which is assumed to be 2% (relative in luminosity, absolute in A LC ) in this analysis. If the detector magnet polarity is reversed, then the detection of DIS positrons would be very similar to that of DIS electrons and all the data simulations, event selections, unfolding, etc. described in Section III B apply. The statistical uncertainty in A LC is thus determined by the luminosity of e + run, which we assume to be one-tenth of that of the electron beam. Note that beam polarization and thus polarimetry uncertainties do not affect A LC measurements.
The EW physics reach of A LC is further clouded by the difference in e − vs. e + DIS cross sections due to higherorder QED effects. We calculate the value of A LC using Djangoh version 4.6.19 in both the Born LO (that includes one-boson exchange only) and NLO radiated mode (that includes higher-order EW and QED effects); see Fig. 5. The difference of NLO minus Born values is taken as an estimate of QED NLO effects and the uncertainty is assumed to be 5% relative. Because of the moderate Q 2 reach of the EIC, the 2% absolute uncertainty from luminosity measurement is a dominating systematic effect for the uncertainty of A (H) LC . In Appendix A 1, we present a method to simultaneously fit the luminosity term with SMEFT coefficients; however, we find this method yields 15 to 20% weaker SMEFT constraints.

G. Projection for High-Luminosity EIC
In addition to the nominal luminosity expected for the EIC, we also carry out projections considering the possibility of an additional ten-fold increase in the annual luminosity beyond EIC's initial phase of running, the so-called highluminosity EIC (HL-EIC). Assuming all experimental systematic effects remain the same, we scale the projected statistical uncertainty of asymmetry observables described in the previous section by a factor of 1/ √ 10. For beam energies with lower luminosity (hence larger statistical uncertainty) or asymmetries of smaller sizes such as A (H) PV , the ten-fold increase in luminosity will push the physics reach one step further. On the other hand, for beam energies with already high luminosity and for observables where systematic effects dominate over the statistical ones, such as A PV . The experimental uncertainties are from three sources: statistical, σ stat ; experimental systematic, σ sys , which is mainly due to particle background, also including other imperfections of the measurement, and is assumed to be fully uncorrelated; and beam polarimetry, σ pol , which is assumed to be fully correlated within data of the same √ s and beam type. For the b th bin, with given √ s, x, and Q 2 values and using the nominal PDF set under consideration, first we compute the theoretical SM prediction, (A PV ) theo SM,0,b . Combining the given uncertainties in quadrature separately for uncorrelated and correlated ones, we obtain a pseudo-experimental asymmetry value by: where r b and r are random numbers chosen from a normal distribution of mean 0 and standard deviation 1. Note that the correlated errors are incorporated using a single random number, r , across all the bins. The systematic uncertainties are σ sys /A = 1%, σ pol /A = 1% for A

B. Pseudo Data for Lepton-Charge Asymmetries
We consider next unpolarized electron-positron asymmetries with unpolarized hadrons, namely the lepton-charge (LC) asymmetries. The uncertainties used in the data generation are from three sources: statistical, σ stat ; experimental systematic, σ sys , which is mainly due to background and is assumed to be fully uncorrelated; luminosity difference between e + and e − runs, σ lum , which is fully correlated within data of the same √ s and ion beam type; and higherorder QED effects, σ QED NLO , taken as 5% of the difference between the calculated NLO and Born (LO) A LC values.
In analogy with Eq. (44), for the LC asymmetries, we write:

C. Uncertainty Matrix
The uncertainty matrix, Σ 2 , for a given data set with N bin bins is an N bin × N bin symmetric matrix. It consists of two parts, which we call Σ 2 0 and Σ 2 pdf : The first part of the matrix, Σ 2 0 , is constructed using all the uncertainty components (statistical, systematic, polarimetry or luminosity, and QED NLO if present) other than the PDF uncertainties. All the uncertainties that enter Σ 2 0 must be absolute; relative uncertainties are converted to absolute ones by multiplying with the theoretical SM prediction, A theo SM,0,b , computed using the central member of the PDF set taken into account. The first part of the matrix then takes the form: where, for the PV asymmetries, we have for the diagonal elements: and for the off-diagonal elements: For the LC asymmetries, we have for the diagonal elements: and for the off-diagonal elements: Here, b and b are bin numbers and we assume full correlation for uncertainties originating from beam polarimetry or luminosity: ρ bb = 1 for all b and b . The second part of the uncertainty matrix, Σ 2 pdf , is built using the same procedure for both PV and LC asymmetries by taking into account differences between the theoretical SM asymmetry prediction computed at the nominal PDF member, A theo SM,0 , and theoretical SM asymmetry predictions evaluated at all other members of the PDF set under consideration, A theo SM,m , where m = 1, 2, . . . , N PDF with N PDF the total number of PDF sets or replicas available. For Hessian-based PDF sets, the diagonal and off-diagonal elements can be collectively written as: For replica-based PDF sets, this expression becomes:

D. Comparison of uncertainty components
We present in this section the various uncertainty components that enter the SMEFT analysis. We also investigate the total uncertainties combined in quadrature that contribute to the diagonal entries of the uncertainty matrix.

Individual uncertainty components
We begin by considering the individual components of the uncertainties. We investigate the effects of sea quarks in the analysis by defining a valence-only approximation for the PDFs. The tag ud in the plot labels implies the valenceonly approximation, in which only up-and down-quark contributions are considered in the hadronic cross section, whereas uds indicates that up, down, strange, and their antiquarks are taken into account. Note that for the data sets involving unpolarized deuteron with the ud tag, there will be no uncertainty from PDFs since deuteron PDFs, defined in terms of proton and neutron PDFs using isospin symmetry, cancel when analytically forming asymmetries in the valence-only approximation. Note also that for experimental systematic uncertainties other than those from beam polarimetry, both 1% and 2% values are shown in all figures of this section, although the 1% value is used in the results presented. Fig. 6 shows the comparison of the uncertainty components for the data set D4 in the ud and uds scenarios. As for the PDFs, we use NNPDF3.1 NLO [25] in the unpolarized case and NNPDFPOL1.1 [26] in the polarized case throughout. Only the (x, Q 2 ) region relevant for SMEFT analysis is shown, although the full region is used for the extraction of the weak mixing angle. The x-axis of these plots is ordered by bin number; these are ordered first from low to high Q 2 , then from small to large x within each Q 2 bin, leading to the observed oscillatory behavior. When we turn on the sea quark contributions, the unpolarized deuteron data sets receive nonzero but highly suppressed PDF uncertainties, indicating that the assumption of deuteron PDFs completely canceling is a reasonably good approximation. The right panel shows that even after including sea quarks, the PDFs are still the smallest uncertainty component. This indicates Comparison of the uncertainty components for the data set D4 in the valence-only scenario (ud) and with the contributions from the sea quarks (uds). Here, "NL" refers to the currently planned annual luminosity of the EIC, while "HL" refers to a potential ten-fold luminosity upgrade.
that potentially poorly determined sea quark and strange quark distributions have little effect on this analysis. The largest single uncertainty component is the statistical uncertainty (shown as a dark red line). This is larger than both the 1% beam polarization uncertainty (light blue line), and either of the 1% or 2% uncorrelated systematic uncertainty assumptions (solid and dotted blue lines, respectively). When we switch to the high-luminosity (HL-EIC) scenario (dotted red line), the statistical uncertainty becomes comparable to the systematic ones. All uncertainties are significantly smaller than the predicted values of the asymmetry, shown as the solid black line in the plots. In Fig. 7, we display the different contributions to the diagonal entries of the uncertainty matrix of the data sets P5 and ∆P5. The pattern of uncertainties for P5 is very similar to that observed for D4. The statistical ones are the largest single uncertainty source, while the PDFs are the smallest. Assuming high luminosity, the statistical uncertainties become comparable to the anticipated systematic ones. The pattern is different for ∆P5: the statistical uncertainties are largest for all bins, even assuming high luminosity. The PDF uncertainties are also non-negligible, consistent with the expectation that spin-dependent PDFs are not known as precisely as the spin-independent ones. The anticipated experimental systematic uncertainties are negligible for all bins.
Finally, we show in Fig. 8 the individual uncertainties for the electron-positron asymmetry data set LP5. The error budget is different for this scenario compared to PV asymmetries. Since both beams are unpolarized, there is no uncertainty related to beam polarization. However, since electron and positron runs occur with different beams, there is the possibility of a significant overall luminosity difference between the two runs that can lead to an apparent asymmetry. We assume an absolute 2% uncertainty, two times the luminosity uncertainty requirement of [27]. Finally, we consider the possible errors arising from higher-order QED corrections that may differentiate between electron and positron scattering. We estimate this uncertainty by taking 5% of the difference between the Born-level and NLO QED results, obtained by using Djangoh. The two largest sources of uncertainty throughout the entire kinematic range are the luminosity and statistical uncertainties. PDFs, higher-order QED, and anticipated systematic uncertainties are all significantly smaller.
Summarizing all the figures presented in this section, we can make the following main points: • The expected statistical uncertainties are the dominant ones for the nominal EIC luminosity. If a high-luminosity (HL-EIC) upgrade becomes realistic, they become comparable to experimental systematic uncertainties for PV LC , particularly at low x and low Q 2 . On the other hand, uncertainties from missing higher-order QED corrections are expected to be small.

Total uncertainties for nominal luminosity vs. high luminosity
We now investigate the total uncertainties for the data sets D4, ∆D4, P5, and ∆P5. We consider four different scenarios: the nominal annual luminosity planned for the EIC or a potential high-luminosity upgrade beyond the initial phase of the EIC run, combined with 1% or 2% relative experimental systematic uncertainties due to particle background. We show the results in Figs. 9 and 10. We observe first that the dominant uncertainty component in all cases is the statistical one. The four uncertainty scenarios, namely 1% or 2% systematic uncertainties combined with nominal or high luminosity, can be in fact reduced to just the luminosity comparison, i.e. nominal vs. high. Next, for both D4 and P5, the asymmetry A (e) PV is measured to percent-level throughout the considered phase space. This is not the case for the polarized sets ∆D4 and ∆P5. Particularly in the ∆P5 scenario at low Q 2 , the anticipated errors are larger than the asymmetry for all choices of systematic error and luminosity. Only in the very high Q 2 bins does a measurement of the asymmetry A (H) PV become meaningful. Our evaluation of the uncertainties indicates that using 1% or 2% relative systematic uncertainties makes practically no difference, as the total errors are mostly dominated by the statistical uncertainties for the PV asymmetries or the luminosity difference for the LC asymmetries. We also show that one can take into account the contribution of only the valence quarks to the asymmetries or include the sea quarks up to strange flavor and its antiquark, both of which lead to the same size of PDF errors for the data sets under consideration. In our best-fit analyses, we thus focus on the data sets with 1% relative systematic uncertainty and nominal luminosity in the uds scenario as our main data sets. Comparisons are performed to the ones having high luminosity, keeping the rest of the configuration the same.
An important issue to address is whether a joint fit of PDFs and Wilson coefficients would change the potential of the EIC to probe the SMEFT parameter space that we find in this draft. This issue has been studied for both HERA and LHC dat sets in the literature [28,29], where it is found that the interplay between PDFs and Wilson coefficients can become a significant challenge for some future high-luminosity measurements at the LHC. It is beyond the scope of this paper to consider such a joint fit, so we can only speculate regarding the exact answer to this question. However, we can make the following points that are supported by the uncertainty plots in this section of this manuscript. • For the unpolarized deuteron data sets, the PDF uncertainties are an order of magnitude smaller than the statistical uncertainties for the nominal luminosity and three to five times smaller than the high-luminosity statistical uncertainties, as shown in Fig. 6. We therefore expect that a joint fit of the PDFs and Wilson coefficients would not greatly effect the bounds obtained here. The PDFs are already determined sufficiently well from other experiments for the purposes of this analysis,.
• The same statement holds for the P5 unpolarized data sets for both nominal and high luminosities, as shown in the left panel of Fig. 7. This plot indicates that taking ratios to form asymmetries, as we do in this study, greatly reduces the dependence on PDFs.
• Looking at the right panel of Fig. 7, we see that the polarized proton PDF error becomes comparable to the statistical error at high Q 2 for the high-luminosity data set. In this case, a joint fit of polarized PDFs and Wilson coefficients will be especially important. We note that the bounds from the polarized hadron data sets are generically much weaker than those for the unpolarized hadron sets (see Fig. 13 for example), since the polarized asymmetry is much smaller than the polarized one. We believe that our main point regarding the EIC sensitivity to SMEFT Wilson coefficients is mostly unaffected by this point.

V. EXTRACTION OF THE SM WEAK MIXING ANGLE
The weak mixing angle, often written as sin 2 θ W , is a fundamental parameter of the SM and has been measured in experiments ranging from atomic parity violation at eV energy levels to high-energy colliders at the Z-pole [30][31][32]. The EIC will provide constraints on sin 2 θ W in the intermediate energy range that resides between the reach of fixed-target and collider facilities.
For the extraction of the weak mixing angle, we focus on A (e) PV , where sin 2 θ W enters through the electron coupling g e V,A and the corresponding quark couplings in the structure functions. We also include the one-loop renormalization group evolution [33] of sin 2 θ W in the MS scheme, including the relevant particle thresholds that arise between µ = M Z and µ = Q 2 . Including target-mass correction terms, we can write: |P e |η γZ g e A 2yF γZ where M is the nucleon mass. Note that given the moderate Q 2 values of the EIC, the pure-Z contribution to the structure functions is omitted for the precision relevant to our analysis. A single pseudodata set is generated using a reference value of sin 2 θ W = 0.231 at the Z-pole and the uncertainties in A (e) PV in each (x, Q 2 ) bin are obtained from simulation studies. Comparing the theory prediction to the pseudodata, a best-fit value and uncertainty projection for sin 2 θ W at the Z-pole are obtained by minimizing the χ 2 function defined as: where A is a dimension-N bin vector with N bin the total number of (x, Q 2 ) bins, Σ 2 is the uncertainty matrix of dimension N bin ×N bin , described in Section IV C, and sin 2 θ W to be fitted enters A theory . The PDF portion of the uncertainty matrix is evaluated using the PDF sets CT18NLO [34] (LHAPDF [35] ID 14400-14458), MMHT2014nlo 68cl [36] (ID 25100-25150), and NNPDF31 nlo as 0118 [25] (ID 303400-303500).
Our results for sin 2 θ W are shown in Tables III and IV for five energy and nominal-annual-luminosity combinations of ep and eD collisions, respectively. These results are illustrated in Fig. 11. The inner error bars show the combined uncertainty from statistical and 1% uncorrelated experimental systematics (due to particle background); the median error bars show the experimental uncertainty that includes statistical, 1% uncorrelated experimental systematics, and 1% electron polarimetry. The outermost error bars, which almost coincide with the median error bars, include all the above and the PDF uncertainty evaluated using the set CT18NLO. Results evaluated with the sets MMHT2014 and NNPDF31NLO are similar. Along with our projection with the EIC annual nominal luminosity, we show the "YR reference point" (blue diamond), obtained from combining 100 fb −1 18 × 275 GeV ep and 10 fb −1 18 × 137 GeV eD pseudodata. Also shown are the expected precision from near-future P2 [37], MOLLER [38], SoLID [39], and PVDIS [40,41] experiments, respectively, that will dominate the landscape of low to medium energy scales.
We note that our results have larger uncertainties than in the YR [24], which fits PDFs and sin 2 θ W simultaneously using the JAM framework [42], possibly due to using realistic detector simulation and accurate running conditions. On the other hand, we find that PDF uncertainties are likely not the dominant ones for the EIC projections, but the electron polarization is, for the settings where the integrated luminosity approaches 100 fb −1 . Consequently, upgrading the luminosity of the EIC does not bring significant improvement on the uncertainty of sin 2 θ W , and therefore we do not show our fitting results for the ten-fold luminosity upgrade. Here, Q 2 denotes the value averaged over all (x, Q 2 ) bins, weighted by (dA/A) −2 stat for each bin. The electron beam polarization is assumed to be 80% with a relative 1% uncertainty. The total ("tot") uncertainty is from combining all of statistical, 1% systematic (background), 1% beam polarization, and PDF uncertainties evaluated using three different PDF sets. The rightmost column is for comparison with the YR.
Beam type and energy eD 5 × 100 eD 10 × 100 eD 10 × 137 eD 18 Our results show that the EIC will provide a determination of sin 2 θ W at an energy scale that bridges higher-energy colliders with low-to medium-energy SM tests. Additionally, data points of different √ s values of the EIC can be combined or the Q 2 -dependence of the EW parameter can be explored, depending on the runplan of the EIC. Furthermore, one could study the exploratory potential of the EIC beyond the scope of a single SM parameter, and we provide results using the SMEFT framework in the next section.

A. Data Generation and Selection
We use the procedure described in Section III to determine the uncertainty of our data projection and the uncertainty matrix. We consider both ep and eD collisions and concentrate on the two highest-energy settings listed in Table II. Because collisions with higher center-of-mass energy are more sensitive to SMEFT operators, we choose four data families with the two highest √ s to focus on: For the highest √ s but lower-luminosity set D5, ∆D5, P5, and ∆P5, we consider two scenarios: the nominal luminosity as indicated above and in Table II, and the high luminosity option denoted with an "HL" label with ten-fold higher statistics.
We use Eq. (44) to generate N exp = 1000 pseudodata sets for each of the data families. We then impose the following selection criteria on the bin points, x and Q 2 , and the inelasticity, y: x < 0.5 , Q 2 > 100 GeV 2 , 0.1 < y < 0.9 .
These restrictions are designed to remove large uncertainties from non-perturbative QCD and nuclear dynamics that occur at low Q 2 and high x, where sensivity to SMEFT effects is anyways expected to be reduced. We note that the condition on y is already applied in the data generation and unfolding stages described in Section III C.

B. Structure of the SMEFT asymmetry corrections
In the computation of SMEFT asymmetry values, A SMEFT , we use the central member of the PDF set under consideration. We use the PDF sets NNPDF31 nlo as 0118 [25] and NNPDFpol11 100 [26] for the computation of  Table 10.1 of the Yellow Report [24], along with existing world data (red solid circles) and near-future projections (green diamonds); see text for details. Data points for Tevatron and LHC are shifted horizontally for clarity. The script used to produce this plot is inherited from [43]. The scale-dependence of the weak mixing angle expected in the SM (blue curve) is defined in the modified minimal subtraction scheme (MS scheme) [33].
unpolarized and polarized PV asymmetries, namely A PV , respectively. We factor out the UV cut-off scale from all the seven Wilson coefficients, C r → C r /Λ 2 , and set Λ = 1 TeV. We turn on only one or two Wilson coefficients at a time and set the remaining ones to zero and linearize the SMEFT expressions with respect to the Wilson coefficient(s) of interest. SMEFT asymmetry expressions then generically take the form: or Comparing Eq. (57) to (44) or (45), we see that at the end of a multi-run analysis, the distribution of the best-fit values for any single Wilson coefficient should be a Gaussian centered at the origin.

C. Best-fit analysis of Wilson coefficients
Generating pseudo-data values, A pseudo SM , and obtaining the SMEFT asymmetry expressions, A SMEFT , we define a χ 2 test statistic as: where N bin is the number of bins in a given data set. Generically, it looks like: for a single-parameter fit of Wilson coefficient C, or: for a two-parameter fit of Wilson coefficients C 1 and C 2 . The χ 2 function is minimized with respect to C or to C 1 and C 2 . This gives us the best-fit values,C orC 1 andC 2 . We obtain the inverse square of the error of the single-parameter best-fit value via: evaluated atC. The inverse covariance matrix, V −1 , of the two-parameter fit is constructed in such a way that its ij th component is given by: for i, j = 1, 2, evaluated at the best-fit values of C 1 and C 2 . Inverting V −1 , we obtain the individual errors and the correlation of the fit:

Averaging over multiple pseudodata sets
When we repeat N exp times the single-parameter best-fit analysis described in Sec. VI C, we obtain N exp best-fit values,C e , with corresponding uncertainties, σ C,e , for each pseudo-experiment e. The mean of the best-fit values is obtained by averaging individual best-fit values weighted by the inverse square of the uncertainties: and the average uncertainty of this mean value is obtained via: When we repeat N exp times the two-parameter best-fit analysis on Wilson coefficients described in Sec. VI C, we obtain N exp pairs of best-fit values,C 1,e andC 2,e , and inverse covariance matrices, (V −1 ) e , for each pseudo-experiment, e.
The best-fit values are averaged similarly to the one-dimensional case but with the inverse square of uncertainties replaced by inverse covariance matrices: The average inverse covariance matrix of the resulting best fit is calculated using: We note the presence of the factor 1/N exp in Eqs. (66) and (68). Without it, we would be effectively increasing the luminosity of the corresponding central data set by a factor N exp . We avoid this by including this factor in computing the average uncertainty or inverse covariance matrix.

Definition of confidence intervals
The result of a single-parameter multi-run fit can be expressed as: and hence we can express the fitted result and the uncertainty of coefficient C as: For a two-parameter multi-run fit, the ellipse equation reads: in the (C 1 , C 2 ) plane. The ∆χ 2 values that determine the size of the best-fit region for an arbitrary confidence level are well-known. For 95% CL, we have ∆χ 2 = 3.841, 5.991, and 7.815 for one-, two-, and three-parameter fits, respectively.

Combination of best-fits from distinct data sets
Suppose we have two data sets, say T1 and T2, from which we obtain the single-parameter best-fit values of Wilson coefficient, C, written asC T1 andC T2 , together with the errors σ C,T1 and σ C,T2 . Assuming these data sets can be treated as uncorrelated to a good approximation, we obtain the combined best-fit value and the corresponding uncertainty by using Eqs. (65) and (66) with slight modifications. First, the summation index e now runs from 1 to 2, representing the number of data sets combined. Second, the factor 1/N exp should be removed from Eq. (66) because we now have indeed two independent, uncorrelated measurements. This method can be generalized to the combination of the best-fit values from more than two data sets, such as different beam energies, and to the case of multi-parameter fits in a straightforward manner.

Simultaneous fit of Wilson coefficients and beam polarization or luminosity difference
We observe in Section IV D that experimental uncertainties such as the beam polarization and luminosity difference between e + and e − runs can be limiting factors for some of the data sets. When the data statistical uncertainty is very precise, there is the possibility that one can use the data themselves to constrain these systematic effects. We present in Appendix A 1 a method to simultaneously fit the SMEFT coefficient(s) and the luminosity difference for the LC asymmetries and in Appendix A 2 a method to simultaneously fit the SMEFT coefficient(s) and the beam polarization for PV asymmetries.

A. Fits of single Wilson coefficients
In this section, we discuss the 95% CL intervals for the Wilson coefficients in single-parameter fits averaged over 1000 pseudo-experiments. The bounds on the Wilson coefficient C eu across numerous data sets are representative and exhibit the common features of fits of single Wilson coefficients. We therefore show only the bounds on C eu to illustrate the main observations and include the remaining Wilson coefficients in Appendix B 1. Fig. 12 displays the 95% CL intervals of C eu for the four data families in which we are primarily interested in this paper. The intervals are grouped by asymmetries, namely electron PV asymmetries, A  From Fig. 12, we can extract the following main points: • Proton asymmetries of all the three types, namely unpolarized and polarized PV asymmetries and LC asymmetries, impose considerably stronger bounds than deuteron.
• High-energy low-luminosity data sets D5 and P5 lead to slightly weaker bounds than the less-energetic but higher-luminosity ones, D4 and P4, respectively.
• Unpolarized PV asymmetries, A PV ; however, it should be noted that for some Wilson coefficients, unpolarized proton asymmetries yield nearly the same bounds as the corresponding polarized ones.
• Data sets in the high-luminosity scenario make a noticeable difference in the size of bounds. The improvement due to increased luminosity is slightly more significant for polarized deuteron asymmetries.
• Bounds from electron-positron asymmetries, A LC , are comparable to or looser than the ones from polarized hadron asymmetries. They never offer stricter bounds than high-luminosity hadron PV asymmetries.
• If the beam polarization is introduced as a new fitting parameter, unpolarized hadron asymmetries give considerably stronger bounds. The improvement is more significant in the high-luminosity scenario. However, the same fitting method yields weaker bounds with polarized hadron asymmetries. We explain this finding in the Appendix A 2.
Assuming weak correlations, one can also combine the bounds within a given family of data sets, e.g. D4, ∆D4, and LD4. We find that the resultant bound is never stronger than the strongest one obtained from the individual family members, which is always from the electron PV asymmetry data set. We note that this observation holds for other Wilson coefficient choices and for the two-dimensional fits in the next section, as well.
In Fig. 13, we present the effective UV cut-off scales, Λ/ √ C eu , with Λ = 1 TeV, corresponding to the bounds shown in Fig. 12. The organization of this plot in terms of asymmetries and data sets is the same as in Fig. 12. Improved bounds on C eu with the addition of the beam polarization to the fits are equivalent to higher energy scales in the unpolarized PV asymmetries, which are indicated by the lighter columns in the background; on the other hand, weaker bounds from the fits with beam polarization are depicted by the lighter columns in the foreground for the polarized PV asymmetries. One can observe that scales reaching 3 TeV can be probed with nominal luminosity, while scales exceeding 4 TeV can be probed for other luminosities. We remark that care must be taken in comparing these mass limits with others found in the literature, which sometimes assume a strong coupling limit, equivalent to setting C r = 4π, and also maximally constructive interference between different quark contributions. For example, converting our results to the notation of [44] would yield a bound on Λ/ √ C eu of 19 TeV, instead of 3 TeV quoted here, which is only very approximate and is calculated by multiplying 3 TeV by √ 4π and √ 5, where the latter is to account for the constructive interference between quark contributions, and by another factor to convert 90% CL to 95% CL.

B. Fits of two Wilson coefficients
In this section, we discuss fits on pairs of Wilson coefficients in order to determine how well the EIC can break degeneracies between parameters that occur in the LHC Drell-Yan data [6,45]. We emphasize that the representative examples shown in this section are the results of the simultaneous fits with beam polarization in light of the significantly improved results of the (1 + 1)-parameter fits in the previous section. The description of the beam-polarization fits is presented in Appendix A 2. The complete set of plots of confidence ellipses is given in Appendix B 2.
In Fig. 14, we compare the 95% CL ellipses for the pair (C eu , C qe ) between the data families D4 and P4. Each asymmetry type gives a distinct correlation pattern, complementary to one another. Electron-positron asymmetries give rise to wide and elongated, band-like ellipses compared to PV asymmetries. As in the case of (1 + 1)-parameter fits, electron PV asymmetries of unpolarized hadrons offer the strongest bounds on the pairs of Wilson coefficients. Comparing deuteron to proton, one can see that proton data are significantly more constraining.     15. 95% CL ellipses for the Wilson coefficients Ceu and Cqe using the data sets D4 and P4 in the (2 + 1)-parameter fit that includes the beam polarization as an additional fitting parameter, compared with the corresponding two-parameter fit from the LHC data [45].
Finally, in Fig. 17, we present the fits from the P4 data set and the LHC adapted from [6] for the pair (C q ). This figure shows that when the LHC data imposes tight bounds on a pair of Wilson coefficients, the EIC preliminary data can introduce far stronger bounds on the same pair of Wilson coefficients. Moreover, the fits from EIC and LHC have distinct correlations, which indicates the complementarity of the EIC to the LHC as a future collider. Treating q using the nominal-and high-luminosity data set P4 in the (2 + 1)-parameter fit that includes the beam polarization as an additional fitting parameter, compared with the corresponding two-parameter fit from the LHC data [6].
the projected EIC and the LHC data to be uncorrelated, we also plot the combined fit of the two, which turns out to even more strongly constrain the chosen pair of Wilson coefficients. We remark that the effective UV scales probed with the combined data set exceed 2 TeV.   q using the nominal-luminosity data set P4 in the (2 + 1)parameter fit that includes the beam polarization as an additional fitting parameter, compared with the corresponding fit from the LHC data [6] and the combined fit of the two.
It should be noted that there appear flat directions in the fits of certain pairs of Wilson coefficients with the projected EIC data that utilize the deuteron beam. Examples include (C eu , C ed ) and (C u , C d ). We can explain these observations analytically. We find that these pairs always appear in a specific way in asymmetry expressions, for example 2C eu − C ed for electron PV asymmetries with unpolarized deuteron. In all such cases, only one of the data families exhibits this behavior, with the degeneracy broken by another data family.
Our results on the bounds from Wilson coefficients in simultaneous (2+1)-parameter fits with the beam polarization as an additional parameter can be summarized as follows: • Proton asymmetries impose much stricter bounds than deuteron.
• Unpolarized hadron asymmetries lead to stronger correlations than the polarized ones.
• The three types of asymmetries of deuteron and proton considered in this work, together with the LHC data, are complementary to each other in the sense that they offer distinct correlation patterns.
• The projected EIC data are capable of resolving all flat directions that appear in the LHC Drell-Yan data.
• The bounds from the projected EIC data can be much stronger than the LHC data, indicating that the EIC has an important role to play in future probes of the SMEFT.
We can ask what happens when more than two Wilson coefficients are turned on simultaneously. We study this in Section B 3, where we turn on six Wilson coefficients. The resulting bounds are 20 to 30% weaker than the ones found here in the 2d case. We note that no flat directions appear in these fits, indicating that the EIC can fully probe this parameter space without degeneracies.

VIII. CONCLUSIONS
In this manuscript, we have analyzed the potential of testing the electroweak SM and exploring BSM physics with the future EIC. We have focused on the precision determination of the weak mixing angle over a wide range of momentum transfer and on probes of heavy new physics. We have provided all the formulae for neutral-current DIS and simulation details that will be needed for future studies of these areas. Our BSM analysis utilizes the model-independent SMEFT framework and focuses on the semi-leptonic four-fermion operator sector of the theory. We translate our formalism into the DIS language in terms of parity-violating couplings and structure functions to facilitate cross-talk between the high-energy-physics and nuclear communities. We provide a detailed accounting of uncertainties from statistics, experimental systematic effects, beam polarimetry for parity-violating asymmetries, higher-order QED corrections for lepton-charge asymmetries, and finally PDFs. Additionally, we explore simultaneously fitting the beam polarization with the anticipated high-precision parity-violating asymmetry data as a possible analysis technique to improve upon the experimental limitation from beam polarimetry.
Our BSM analysis finds that UV scales in excess of 3 TeV can be probed with the currently planned (nominal) annual luminosity of the EIC, with scales above 4 TeV possible if a ten-fold luminosity upgrade becomes available beyond EIC's initial decade of running. For the latter, we focus on studying the physics reach and limitations from sources other than statistical, without comment as to the feasibility of such an upgrade. The most stringent bounds come from polarized electron scattering off of unpolarized protons. Constraints from polarized hadrons, deuterons, and from a possible future positron beam provide important complementary probes. Our complete study of correlations between Wilson coefficients finds that no degeneracies remain upon combining all EIC data sets. This is not the case with LHC Drell-Yan measurements, in which numerous degeneracies exist, and will continue to occur even after LHC's high luminosity running.
This demonstrates that the EIC polarization provides a powerful probe of BSM effects. Although the EIC is primarily thought of as a QCD machine, it is also a powerful probe of potential new physics, in many ways complementary to the higher-energy LHC. We note that current global fits of LHC data, for example to top-quark and Higgs data, probe orthogonal sets of Wilson coefficients [46]. However, the strongest LHC constraints on the semi-leptonic four-fermion sector of the SMEFT come from the Drell-Yan data. Given that LHC Drell-Yan cross sections are blind to certain combinations of Wilson coefficients, we envision that high-precision EIC data will help remove these degeneracies in global fits, both at the present moment and when high-luminosity LHC Drell-Yan data become available. We hope that our work motivates future studies of the unexpected power of the EIC for new-physics searches.
Appendix A: Additional fits

Luminosity difference fits
Since electron and positron data would be taken at different times with different beam configurations, there is the possibility of a significant offset between the absolute luminosities of the two data sets. In the main text, we include this uncertainty in the error matrix as the luminosity error, σ lum = 0.02, which is assumed absolute. We study here the possibility of simultaneously fitting this luminosity difference together with the Wilson coefficients.
We fit the pseudodata for the LC asymmetries with an overall shift, A lum , added to the pseudodata. Then, we define the χ 2 test statistics as: where we omit the uncertainty in the luminosity difference between e + and e − runs from the uncertainty matrix: However, we keep the luminosity uncertainty in the pseudodata generation. By introducing the luminosity difference, A lum , as a new parameter, we extend our one-parameter and two-parameter Wilson-coefficient fits to (1 + 1)-and (2 + 1)-parameter fits. We find that there are mild correlations, |ρ r | 0.4, between A lum and any C r in the (1 + 1)-and (2 + 1)-parameter fits. In addition, the fitted results for Wilson coefficients have slightly larger uncertainties when the luminosity difference is treated as a fitting parameter. In Fig. 18, we show the 95% CL intervals with and without A lum for the Wilson coefficient C eu in all the four LC asymmetry data sets of interest. In Fig. 19, we compare the 95% CL ellipses of the Wilson coefficients (C eu , C qe ) between the data sets LD4 and LP5 with and without the luminosity difference as a fitted parameter. From these figures, we see that the 95% CL bounds on C eu become 15 to 20% weaker. The difference is less noticeable in the confidence ellipses.

Beam polarization fits
In the same spirit as the previous section, we now consider fitting the beam polarization simultaneously with the Wilson coefficients in an attempt to reduce the uncertainty associated with the experimental limitation from beam polarimetry. We fit the pseudodata for the PV asymmetries by including a factor of P in the SMEFT asymmetries. We then define a χ 2 test statistics as: In this approach, we omit the beam polarization uncertainty, σ pol , from the uncertainty matrix because it is now treated as a fitting parameter:Σ but not during pseudodata generation. The second term on the RHS of Eq. (A3) is added by hand, whereP and δP are the beam polarization value and its uncertainty provided by the polarimetry, respectively, presumably uncorrelated with the asymmetry measurements. The logic behind this addition is that experimentally, the polarimetry does provide knowledge on the beam polarization, but we hope to obtain a better determination of the polarizations within the uncertainty provided by the polarimetry by fitting data with high statistical precision. As for the beam polarization itself, we use a normalized value ofP = 1 in this study for simplicity. Treating the new term to be the contribution of a new observable, we increase the degrees of freedom of the χ 2 distribution by 1. As in the case of luminosity difference fits, we extend our 1-and 2-parameter fits of Wilson coefficients to (1 + 1)-and (2 + 1)-parameter simultaneous fits by including the beam polarization as a new parameter. From (1 + 1)-parameter fits, we find that P and any C r are rather weakly correlated, |ρ r | 0.1, in the polarized hadron data sets, whereas there are strong correlations, |ρ r | 0.7, in the unpolarized hadron asymmetries. We observe similar correlations in the (2 + 1)-parameter fits.
In Fig. 20, we present the allowed intervals of the Wilson coefficient C eu for the nominal-and high-luminosity data sets P4 and ∆P4, while Fig. 21 displays the 95% CL ellipse of the Wilson coefficients (C eu , C qe ) for the same data sets in the nominal-luminosity scenario. We find that bounds from unpolarized hadron data sets become stronger by 30 to 50%, yet the ones from polarized hadron asymmetries become 15 to 20% weaker. The improvement is sharper in the high-luminosity unpolarized hadron sets, whereas the worsening is significant for the nominal-luminosity polarized hadron sets.  One can explain why the bounds become weaker in the polarized hadron sets by referring to the correlations. Since in these data sets, the beam polarization and the Wilson coefficients are found to be weakly correlated, one would naively expect the bounds obtained from single-parameter fits of Wilson coefficients to roughly remain the same on the grounds that P and C k can be thought of almost fully independent so that they will not affect each other in the fits. Thus, any increase in the allowed limits of Wilson coefficient can be attributed to the increase in the number of parameters fitted, which is reflected as the normalization of the uncertainties of the fit. In this section, we present the 95% CL intervals and the corresponding effective UV cut-off scales for all the seven Wilson coefficients in single-parameter fits averaged over 1000 pseudo-experiments. We recall the following abbreviations for the EIC preliminary data sets: • electron PV asymmetries of unpolarized deuteron, A • unpolarized electron-positron asymmetries of unpolarized hadrons, A (H) LC : LD4, LD5, LP4, and LP5 with the same energy configuration as the corresponding D-and P-sets, but with the luminosity of the positron beam assumed to be 10 times smaller than that of the electron beam.
Figs. 22-28 display the 95% CL bounds of each Wilson coefficient for the four primary data families. As in the main part of the manuscript, the intervals are grouped by asymmetries, namely electron PV asymmetries of unpolarized hadrons, A  Table 10.1 of the YR [24]. The high luminosity ("HL") is assumed to be 10 times higher than the nominal one and requires a luminosity upgrade of the EIC. In each block of intervals, there are four double lines in the case of PV asymmetries and four single lines in LC asymmetries. These four lines correspond to the data families D4 (black and its shade), D5 (red), P4 (blue), and P5 (orange). The darker of the two lines indicate the bounds from single-parameter fits with the Wilson coefficient C r , whereas the lighter ones show the bounds on the Wilson coefficient from simultaneous two-parameter fits with C r and the beam polarization. The details of the fits involving the beam polarization as an additional parameter are described in Appendix A 2.
L P 4 L P 5     For completeness, we summarize in Table V

Fits of two Wilson coefficients
In this section, we present the complete set of confidence ellipses for all possible pairs of the seven Wilson coefficients that we consider in this work. The ellipses are plotted at 95% CL and Λ = 1 TeV.
As before, we refer to electron PV asymmetries collectively as unpolarized A PV , hadron PV asymmetries as polarized A PV , and electron-positron asymmetries as lepton-charge A. Data sets with the label NL or HL indicate the luminosity: The nominal luminosity ("NL") refers to the annual integrated luminosity of Table 10.1 of YR [24]. The high luminosity ("HL") is assumed to be 10 times higher than the nominal one and requires a luminosity upgrade of the EIC.
Each figure consists of four panels, containing one of the four families of data sets, namely D4, D5, P4, and P5. We show the fits from polarized and unpolarized PV asymmetry data sets in both nominal-and high-luminosity scenarios for comparison. We remark that the ellipses for the polarized and unpolarized PV asymmetry data sets indicate the results of simultaneous fits on Wilson coefficients with the beam-polarization parameter, P , in light of significant improvements in the results with unpolarized PV asymmetries. Moreover, we include for some representative examples fitted results for the LHC Drell-Yan data, adapting from [6] and [45].                                                                                For completeness, we show the correlations of Wilson coefficients in Figs. 57-60.      FIG. 60. The same as in Fig. 57 but for P5.

Six-dimensional fits
In this section, we discuss projected bounds arising from a fit of six Wilson coefficients. We show projections onto one and two Wilson coefficients from the full 6d hyper-ellipse. The computational power required for higherdimensional fits increases because we increase the number of pseudoexperiments to reflect the required statistics for the fits. We find that N exp = 10 3 pseudoexperiments for 2d fits lead to stable results, meaning the best-fit values and the corresponding bounds do not change for N exp ≥ 10 3 . This number becomes N exp = 10 4 for 3d fits, N exp = 10 5 for 4d fits, N exp = 10 6 for 5d fits, and N exp = 10 7 for 6d fits. We note that starting from 3d fits, the size and characteristics of multidimensional fits stabilize as we increase the number of fitted parameters, in the sense that the bounds do not change further. This indicates that the 6d fits stand as a useful indicator of what happens in the complete 7d fit of the Wilson coefficients.
To illustrate our discussion, we pick the representative Wilson coefficients C eu , C ed , C q , C u , and C qe and consider the data from ep collisions in the configuration 10 GeV × 275 GeV with 100 fb −1 , which is the nominalluminosity P4 data set. In Fig. 61, we compare the bounds from the original 1d fits to the projected bounds from the 6d fit of the Wilson coefficients C eu , C ed , C (1) q , C q , C u , and C qe . We observe that the bounds become 25 to 40% weaker as we increase the number of Wilson coefficients fitted. This is due to an interplay between the increased number of fitted parameters and correlations among them. In Figs. 62-66, we compare the confidence ellipses from the initial 2d fits to the ones in the two-parameter projections of the 6d fit of the aforementioned Wilson coefficients. We find that the bounds become 20 to 30% weaker as in the case of the comparison of the one-parameter fits and projections, which can be explained by the same reasoning as mentioned above. The other choices of six Wilson coefficients lead to similar results. We note that no flat directions appear in these fits, indicating that the EIC can fully probe this parameter space without degeneracies.
q , C u , and Cqe using the data set P4 at Λ = 1 TeV.