Compatibility of Neutrino DIS Data and Its Impact on Nuclear Parton Distribution Functions

In global analyses of nuclear parton distribution functions (nPDFs), neutrino deep-inelastic scattering (DIS) data have been argued to exhibit tensions with the data from charged-lepton DIS. Using the nCTEQ framework, we investigate these possible tensions both internally and with the data sets used in our recent nPDF analysis nCTEQ15WZSIH. We take into account nuclear effects in the calculation of the deuteron structure function $F_2^D$ using the CJ15 analysis. The resulting nPDF fit, nCTEQ15WZSIHdeut, serves as the basis for our comparison with inclusive neutrino DIS and charm dimuon production data. Using $\chi^2$ hypothesis testing, we confirm evidence of tensions with these data and study the impact of the proton PDF baseline as well as the treatment of data correlation and normalization uncertainties. We identify the experimental data and kinematic regions that generate the tensions and present several possible approaches how a consistent global analysis with neutrino data can be performed. We show that the tension can be relieved using a kinematic cut at low $x$ ($x>0.1$) and also investigate a possibility of managing the tensions by using uncorrelated systematic errors. Finally, we present a different approach identifying a subset of neutrino data which leads to a consistent global analysis without any additional cuts. Understanding these tensions between the neutrino and charged-lepton DIS data is important not only for a better flavor separation in global analyses of nuclear and proton PDFs, but also for neutrino physics and for searches for physics beyond the Standard Model.

Due to the weak nature of the neutrino-nucleus interaction, heavy nuclei such as iron or lead have been usually used as targets in neutrino scattering experiments in order to obtain data with sufficiently high statistics.Therefore, if one were to use the neutrino DIS data in an analysis of the structure of the proton, a nuclear correction factor would be required.Indeed it is much more natural to analyze neutrino DIS in the framework of nuclear PDFs (nPDFs).Out of all available up-to-date global analyses of nPDFs, most include a small selection of neutrino inclusive or semi-inclusive DIS data.The reason why nPDF analyses do not include the totality of neutrino DIS data can be traced back to concerns about possible tensions between neutrino DIS data and the charged-lepton data fitted in nPDF frameworks.
In the past decades, there have been several dedicated analyses of neutrino DIS data using the framework of nPDFs.They started with Ref. [22], where it was shown by conducting an analysis of neutrino DIS crosssection data from NuTeV [23] and dimuon data from NuTeV and CCFR [19] that the extracted iron PDFs in the nCTEQ framework led to a nuclear ratio of the charged-current structure function F 2 that is flatter and significantly different from the similar ratio extracted directly from the charged-lepton DIS data, as described, e.g., by the Kulagin-Petti model [24] or the SLAC/NMC parametrization [25].In particular, the lack of shadowing of the charged-current structure function ratio in the lowx (x ≤ 0.1) region is quite atypical.Another peculiarity can also be observed: the typical antishadowing which is present in the neutral current data at moderate x (0.06 < x < 0.3) is shifted to much smaller x.The stark difference in the nuclear correction factor triggered a follow-up study [26], where a global analysis that included charged-lepton and Drell-Yan (DY) data as well as neutrino DIS from NuTeV [23] and Chorus [27] was performed.It concluded that the neutrino DIS data is incompatible with the charged-lepton data citing the high precision of the NuTeV cross-section data and especially the correlated systematic uncertainties as the main reason for the conclusion.Some time later two related studies [28,29] were carried out in the EPS nPDF framework.The authors found only a mild tension between the neutrino DIS data and the charged-lepton DIS data.They further suggested [29] that data normalization might be the reason of the apparent incompatibility.By normalizing cross-section data with the integrated cross-section in each energy bin and using a Hessian reweighting analysis based on linearization of theory predictions near the minimum, it was shown that the neutrino DIS data, in particular those from NuTeV, could be included in a global analysis with charged-lepton DIS data without causing significant tensions.It is worth noting that the NuTeV data used in Ref. [29] were without point-topoint correlations, which as it was also shown in the previous nCTEQ analysis [26] makes a large difference.With uncorrelated systematic errors the NuTeV data can be described with a very good χ 2 even in Ref. [26].Nevertheless, even if NuTeV data is described well, some charged-lepton DIS data, especially those taken on a nucleus close to iron in the mass number, have χ 2 /pt significantly larger than unity.Furthermore, without a proper global analysis, the linearization method employed in Ref. [29] might not be sufficient to capture the true minimum, considering the fact that there are almost four times as many neutrino DIS data points as there are charged-lepton and DY data.
Another intriguing study aiming at comparing the neutrino DIS data with the rest of the data was performed by Kalantarians et al. [30].There, F Fe 2 /F D 2 data from BCDMS and NMC were transformed into F Fe 2 by multiplying the data with F D 2 from the NMC parametrization [25].This neutral current F Fe to the charged-lepton DIS, DY, and neutrino DIS data from NuTeV and Chorus, we now include the W and Z boson production data from the LHC [34][35][36][37][38][39][40], single inclusive hadron production data from both RHIC [41][42][43][44] and the LHC [45][46][47], charm-dimuon data from NuTeV and CCFR [19], and neutrino DIS data from CDHSW [48] and CCFR [49,50].Furthermore, we improve on the treatment of the deuteron corrections which are applied to F 2 theory predictions.We also improve the treatment of normalization uncertainties by fitting their fluctuations to the data.To have maximal discriminatory power from the highly correlated data like NuTeV and Chorus, we take into account their correlated systematic uncertainties in all fits.We also allow the strange quark PDF parameters to vary, in contrast to our previous analysis [26] where we assumed that they are fixed by requiring s + s = κ(ū + d).As a result of all the aforementioned improvements and additions, the analysis presented in this paper is the most comprehensive analysis of the neutrino DIS data available so far.
As a result of our compatibility study we also identify several approaches how neutrino DIS data can be used together with the charged lepton DIS data in global nPDF analyses while avoiding much of the tension.We also present the best approach which will be used in our future global release of nCTEQ nPDFs with neutrino data.In the meantime, we also publish the nPDFs obtained in the current analysis which are our most complete set of nPDFs until now.
The remaining part of the paper is organized as follows.The analysis framework that serves as the basis for this work is briefly reviewed in Sec.II.Section III is dedicated to the neutrino data new to this analysis.This section also contains some preliminary checks of the internal consistency of the neutrino data among themselves.Section IV is the core of this paper and introduces the compatibility criteria used in reaching the conclusions.The main point is the discussion of the compatibility between the charged-lepton and neutrino data.We investigate the impact of data selection, treatment of errors and the kinematic cuts in Sec.V.The details of the combined fit with neutrino and other data are given in Section VI.The whole study is then summarized in Section VII which also provides an outlook and a possible interpretation of the results.In addition, we list the explicit results of all fits performed in the course of this analysis in Appendix A and we discuss normalization issues and our method to handle the d' Agostini bias in Appendix B.

II. ANALYSIS FRAMEWORK A. nPDF fitting framework
The extraction of nuclear PDFs in this analysis is performed using the same framework already employed in the nCTEQ15 analysis [8] and all our subsequent analyses [10,11].Specifically, for a nucleus with mass number A the full nPDF, f A i , is expressed in terms of effective bound-nucleon distributions: where i is a parton flavor, Q is the factorization/evolution scale, x is the fractional momentum of the parton with respect to the average momentum of the nucleons, Z and N = (A − Z) are respectively the number of protons and neutrons inside the nucleus, while f p/A i and f n/A i are the effective bound proton and neutron PDFs respectively.The momentum fraction x in this case takes in principle the values 0 ≤ x ≤ A. However, we assume that f A i (x, Q) = 0 for x > 1 which is reasonable as long as we neglect the motion of bound nucleons inside the nucleus [51].
The bound neutron PDFs can be obtained from the bound proton ones by assuming isospin symmetry.The bound proton PDFs are parametrized at the input scale Q 0 = 1.3 GeV using the following parametrization [8]: where the flavor index i runs over i = uv, dv, g, ū + d, s + s, s−s.Here uv and dv are the up and down quark valence distributions, and g, ū, d, s, s are the gluon, anti-up, antidown, strange, and anti-strange quark distributions, respectively.The free coefficients c i are assumed to be A-dependent and the general form of this dependence is given by Here, p i are the free-proton PDF parameters obtained in a dedicated proton PDF analysis of Ref. [52], which are close in value to the CTEQ6.1Mparameters [53].
We have chosen the free-proton PDF parameters in order to avoid possible inconsistencies when proton PDF analyses use data taken on nuclei.The analysis [52] excludes all nuclear data such as the CCFR F 2 and F 3 neutrino DIS data [49].The nPDFs for different nuclei are obtained by fitting the nuclear parameters a i and b i to the experimental data.In total, there are about 40 a i and b i parameters each.Some of these parameters are constrained by the usual sum rules, but the rest remains to be constrained by the data.Given that in the case of nuclear PDFs the data are not so numerous and precise as in the proton case, many of the free parameters need to be fixed in any nPDF analysis.Comparing two different nPDF extractions can be made difficult if the analyses in question use vastly different numbers of free parameters.In such a case, parametrization bias becomes an issue which is difficult to overcome.In this analysis we have succeeded to perform every relevant fit containing a sufficient number where F D 2 is computed using Eq. ( 5).
of data points with the same large number of free parameters.Only for special fits to a very small subset of data, we were forced to use a smaller number of free parameters to reliably estimate the uncertainties of these analyses within the Hessian approach.
In general, even though the A-dependence of the parton distribution functions given in Eq. ( 4) allows for great flexibility, there is insufficient data to constrain the whole functional form.Therefore, we opt to fix most of the b i coefficients and let them vary only in cases where we expect precise data taken on multiple nuclei can constrain them.

B. nCTEQ15WZSIHdeut
Before discussing the neutrino data, we need to carefully specify the nPDFs we will compare our results against.The global analysis that we use as a reference here is based on the recent nCTEQ15WZSIH analysis [11] which uses charged lepton DIS, DY, LHC W and Z boson production data and single inclusive hadron production data from both RHIC and LHC to determine the nPDFs.
However, we improve upon the nCTEQ15WZSIH analysis in several respects.First, we remove the isoscalar corrections that were applied when the data were published using the same method as used in Ref. [51], to improve the up-and down-quark PDF separation.Moreover, in order to take into account the nuclear corrections in deuteron data, we correct the deuteron F 2 structure function predictions using the method discussed in Ref. [51].Specifically, the deuteron where F D,CJ 2 and F p,CJ 2 are the fitted deuteron and proton structure functions from the CJ15 analysis [2] and F p,nCT EQ 2 is the computed proton structure function using our base proton PDFs.Without this method, the deuteron F 2 is traditionally computed as a simple isoscalar combination, F N 2 ≡ F p 2 + F n 2 [8,12].In Fig. 1, we show the ratio We can see that our treatment for the deuteron structure function modifies F N 2 by ∼ 1% at x ≤ 0.1 and ∼ 3.5% at x ≈ 0.65.The different treatment of the deuteron structure function influences the description of all the chargedlepton DIS data which are published as ratios This set of data includes data taken on a wide range of nuclear targets and it constitutes about a half of the data in the nCTEQ15WZSIH analysis.
For DIS data, we apply our standard kinematic cuts namely we only keep data with 25 GeV2 , where Mp is the nucleon mass. 2 As in [11], we use the same strict p T ≥ 3 GeV cut for all single inclusive hadron data (compared to p T ≥ 1.7 GeV in nCTEQ15 and EPPS16).We have repeated the nCTEQ15WZSIH analysis with all corrections and cuts mentioned above and enlarged the set of free parameters from 19 to 27.Specifically we fit: On top of these free parameters, there are 10 additional free normalisation parameters which are also determined in the fit using the approach highlighted in App.B. Similar to the analysis presented in [11], 7 normalisation parameters are used to describe the single inclusive hadron experimental data and 3 normalisations are used for the description of the W -and Z-boson production measurements from the LHC.After fitting 940 data points from the same experiments that were also used in the nCTEQ15WZSIH analysis [11], we obtain a χ 2 = 735 corresponding to χ 2 /pt = 0.782.
The list of values of all parameters obtained in this analysis is given in App. A. In the following text we refer to this new analysis as nCTEQ15WZSIHdeut.For completeness, in Tab.I, we compare the quality of the new nCTEQ15WZSIHdeut fit with the previous nCTEQ15WZSIH and the nCTEQ15 analyses.The values of χ 2 /pt for each experiment are displayed in Fig. 2. The resulting PDFs are then compared for all relevant flavours at the scale Q 2 = 4 GeV 2 in Fig. 3.For comparison, we use the same ∆χ 2 = 45 tolerance to define the uncertainties for all three     analyses.There are several differences which can be observed between the original nCTEQ15WZSIH and the nCTEQ15WZSIHdeut analyses.In all parton flavors, we observe larger uncertainties compared to the nCTEQ15WZSIH analysis.This is connected to the enlarged number of free parameters which now can more realistically describe the true uncertainty.The differences in the central values for the up-and downquark parton distributions are the expected consequences of removing the isoscalar corrections and of the different treatment of the deuterium in DIS data together with a slightly larger number of free parameters.The differences seen in the gluon distribution can be attributed to different free parameters used to describe the gluon PDF as well as secondary effects on the gluon from altered scaling violations coming from the modified deuteron data.In the case of the strange quark, the only constraint comes from the W and Z boson data from the LHC as well as the sum rules linking all PDFs together.Given the lack of data constraining the strange quark, we conclude that what is displayed in Fig. 3 is just the parametrization bias where even our parametrization with a large number of free parameters cannot reproduce the true uncertainty in the determination of the strange quark PDF, which should be regarded as much wider than the plotted bands in Fig. 3.It is here where neutrino DIS could play a major role in a global PDF analysis, providing additional sensitivity to the strange quark PDF.

A. Neutrino data and observables
As in any global analysis, data selection is an important factor which, as previous analyses of neutrino data show, can largely influence the obtained results.Given that we investigate the compatibility of neutrino DIS data with the rest of nuclear data, we aim at including all available neutrino DIS data.The experimental collaborations usually publish their results for different observables as differential cross-sections or structure functions.Given that the structure functions are extracted from the cross-section data and that this extraction often requires certain assumptions or input from theory, we prefer to use the differential cross-section data whenever possible.
There are two kinds of neutrino data included in the current analysis.All the new data with a breakdown of the number of neutrino and anti-neutrino DIS crosssection data points that satisfy the kinematic cuts Q 2 > 4 GeV 2 and W 2 > 12.25 GeV 2 applied in our analysis are listed in Tab.II.We also give the range of (anti-)neutrino energy bins for each data set.
The largest and the most important contribution comes from the measurements of the inclusive doubledifferential cross section for the scattering of neutrinos and anti-neutrinos on iron or lead nuclei.The data taken on iron targets come from the CDHSW [48], CCFR [49,50] and NuTeV [23] collaborations whereas Chorus [27] data are taken on lead.For Chorus, CCFR and NuTeV data the electroweak corrections were applied directly to the experimental data.The Chorus and NuTeV data provide point-by-point correlated systematic uncertainties which we include in our analysis. 3There is one issue that needs to be mentioned here.Given that the NuTeV experiment was conceived as a followup experiment to the older CCFR experiment and given that in [23] it was claimed that the CCFR experiment had issues such as with mapping of the magnetic field affecting the measurements at large x, we apply a cut excluding all CCFR data with x > 0.4.
Apart from the data mentioned before, there have been measurements of neutrino DIS reported by the NOMAD [54,55], IceCube [56] and Minerva [57] collaborations which we do not consider in this analysis for different reasons.The NOMAD cross-section data would be the most promising given the high statistics and given that the data were taken on multiple nuclear targets.Unfortunately, the inclusive differential crosssection data have never been publicly released.The IceCube data are measured at extremely small x ∼ 10 −6 where a possibly different theoretical treatment might be required and come with large uncertainties.Finally, the latest results come from the MINERνA neutrino scattering experiment on polystyrene, graphite, iron and lead targets.The collaboration published the ratio of the neutrino scattering single-differential cross section, dσ/dx, as function of x and neutrino energy Eν .Unfortunately the average virtuality Q 2 is below the Q 2 = 4 GeV 2 threshold and so the data are excluded from the analysis by our kinematic cuts.
The second class of data we consider is the semiinclusive production of di-muons in (anti-)neutrino DIS measured by the NuTeV and CCFR experiments [19].There are additional numerous data from the CDHS [58], Chorus [59] and NOMAD [60] collaborations which we do not include in our analysis.The older data from CDHS and Chorus experiments provide no additional constraint compared to the di-muon data we include.The NOMAD data are more precise but due to technical difficulties we were unable to make use of them in this analysis.However, at the end of this paper, we compare the results of our analysis against the NOMAD data and show that the theoretical prediction from the final result of our analysis correctly describes the data.Still, precision of the NOMAD data suggests that further studies of their PDF constraints could be valuable.
It is not a simple task to compare the precision of different experimental measurements if the For anti-neutrino data, the order is somewhat different: NuTeV and CDHSW are comparable in precision, followed by CCFR and Chorus.This conclusion has to be taken with a grain of salt.The averaging procedure and most importantly discarding the correlations might change this simple picture.We will perform much more detailed studies in the following.

B. Nuclear corrections from neutrino cross-section data
Before we perform a global analysis including the neutrino data in our nPDF framework, it is instructive to attempt to quantify a nuclear correction factor extracted purely from these data alone.Given that the neutrino double-differential cross-section data are reported as a function of the usual DIS variables x, y, and Eν , while the nuclear ratio is typically given only as a function of x assuming the variation with changing Q 2 is small, an averaging procedure is necessary.We define the nuclear ratio of the cross-section and its uncertainty for each data point as where σ free is the predicted differential cross section using "free" iron or lead PDFs, f A,free i , defined by Here, f p(n) i are the free proton (neutron) PDFs, which in our case are taken from our proton baseline.The quantity ∆σ(x, y i , E i ) is the total sum of statistical and systematic uncertainties for the data points added in quadrature, except for the normalization uncertainty.We construct a weighted average of the nuclear ratios, such that for a given x the weighted-average ratio and its uncertainty are: The weight w i is defined as where the sum runs over data points with the same x.
This averaging procedure is similar to the one used in Ref. [29], although there are differences in the definition of the weight w i and of the uncertainty ∆R(x).In such a procedure the dependence on the remaining variables is averaged out.This of course is only reasonable if there is just a mild dependence of the nuclear correction factor on the remaining variables.We have checked that this x 0.8 0.9 assumption is reasonably valid for a wide range of Q 2 and y within the kinematic range allowed by our cuts.Some deviations from this assumption can be observed below x = 0.015 and above x = 0.75, where R can be spread around unity quite widely.Therefore, any inference based on this averaging procedure in these regions should be done with caution.In Fig. 4, we show the nuclear correction factors R ν (x) and R ν (x) obtained from the inclusive neutrino and anti-neutrino cross-section data from CDHSW, CCFR, NuTeV and Chorus.To better compare the shape of the nuclear corrections from different data sets, we also show an interpolation (solid lines), obtained from fits with the parametrization of the ratio [23] For comparison, we also include the SLAC/NMC nuclear correction factor [25] which approximately describes the nuclear effects in the charged lepton data.
In the left panels of Fig. 4, we show the shape of crosssection ratios where σ f ree is computed using our proton baseline PDFs.We observe that the CCFR and NuTeV ratios generally agree at low x, but the NuTeV ratio is consistently above the CCFR one for x > 0.4.This is consistent with the observation in Ref. [23] where issues with the CCFR experiment were cited which account for this discrepancy.In the following we will also apply a cut x < 0.4 to the CCFR data.Overall, for the iron neutrino data (CDHSW, CCFR and NuTeV), there is no obvious shadowing, i.e. the appearance of R < 1, at low x (x ≤ 0.1) as one expects from the SLAC/NMC model.This is even more so for CDHSW data.However, the bin center correction was not applied for the CDHSW data, which affects largely low-and high-x data [23].In contrast to the data on iron, the nuclear ratio obtained from the Chorus data shows a shape more similar to the traditional SLAC/NMC ratio.
The nuclear ratio defined above obviously depends on the underlying proton PDFs used for the free proton cross-section in the denominator of Eq. ( 6).This dependence can be seen when we compare the left and the right panels in Fig. 4. The right panels show the same nuclear ratios as the ones on the left, but the ratios are constructed using the more recent CT18 NLO PDFs.Here we have used a dedicated fit which does not include any neutrino data in the CT18 analysis to avoid inconsistencies [61].Comparing the nuclear ratios coming from different underlying proton PDFs, we can clearly see differences in the x-shape of these ratios.The largest difference is apparent at low x.The ratios constructed from CT18 NLO PDFs show signs of shadowing at x ≤ 0.1 in contrast to the ones where the nCTEQ15 proton baseline PDFs were used.This should serve as a warning to draw conclusions about the existence of shadowing in neutrino data from observables, which are not purely data driven and depend on some assumptions such as the proton parton distributions.In the previous section, we have investigated the nuclear effects using just the data, constructing the weighted average of cross section ratios.We have observed in Fig. 4 that the resulting x-dependence varies between neutrino experiments and is different from the expected SLAC/NMC result.Here we will go one step further and perform a neutrino analysis using the nPDF framework detailed in Sec.II.In this analysis, which we will refer to as "DimuNeu", we include only the inclusive and semi-inclusive neutrino data listed in Tab.II.Compared to our previous analyses, we improve on the treatment of correlated errors and normalisation uncertainties.The details of this treatment are given in App.B. Before going further, we note that extracting a reliable set of nPDFs from neutrino data alone is not possible without making some assumptions given that the neutrino data alone cannot constrain all possible parton distributions.In this global neutrino analysis, we set the gluon PDF parameters to be the same as those in the nCTEQ15WZSIHdeut fit.Furthermore, we set the d/ū ratio to be the same as in the free proton case, as we assume that the nuclear corrections to ū and d are similar and cancel in the ratio [22].This fit therefore uses 20 free parameters.In addition, the normalizations of all data sets are also determined from the fit, which introduces 10 additional free parameters.The uncertainties of the parameters are determined using the Hessian method (for details see [8]) with the same ∆χ 2 = 45 tolerance criterion as the one used in the nCTEQ15WZSIHdeut analysis.
The results of the DimuNeu analysis are threefold.First, the list of final values of all parameters after the DimuNeu analysis can be found in App. A. Next, the χ 2 values for all data and for each data set separately are given in Tab.IV.Lastly, in Fig. 5 we show the ratio of nuclear PDFs for the whole nucleus to the PDFs for the whole nucleus obtained using the free proton PDFs.We compare the nuclear parton distribution functions extracted from the neutrino data to the ones extracted in the nCTEQ15WZSIHdeut analysis in Sec.II B. We observe that the results from the DimuNeu and nCTEQ15WZSIHdeut analyses are distinctly different for the valence quark PDFs as well as for the non-valence quark PDFs.The shapes are different even if we consider the PDF errors of both analyses.The strange quark nPDF also differs between the two analyses.In the case of iron PDFs the changes in the strange quark PDF are still within the uncertainties but for lead the strange quark PDF is distinctly different.The gluon PDF parameters were fixed and so the gluon PDF is the same in both analyses.
It is instructive to see how the resulting nPDFs from the DimuNeu analysis describe the experimental data.In Fig. 6 we compare the predictions stemming from the DimuNeu analysis for the nuclear correction factor constructed from the F 2 structure functions from the neutral or charged current deep inelastic scattering to the corresponding structure function data.There is a subtlety one has to take into account.In the case of the neutral current DIS (see the left panel of Fig. 6), the data are presented as ratios F A 2 /F D 2 , where the denominator comes from a measurement on deuterium targets.In the charged current case with neutrino beams (see the right panel of Fig. 6), deuterium targets are not heavy enough to generate sufficient statistics.Therefore, one uses a nuclear correction factor constructed as where the charged current structure function In the case of the theoretical predictions, the numerator is calculated using the nuclear PDFs, f A i , for the corresponding nucleus A, and in the denominator the combination of free proton and neutron PDFs, f A,free i , are used instead.In Fig. 6, the experimental points are obtained by dividing the data on F CC 2 by the same "free" PDF denominator as for the theoretical prediction.In Fig. 7 we also show predictions from the DimuNeu analysis for the W ± production at the LHC as a function of the rapidity of the charged lepton y ± .
Based on the total χ 2 in Tab.IV, we see that the DimuNeu result can decently describe all neutrino data.We see however that not all data are described equally well.On one side, both neutrino and anti-neutrino data from CDHSW and CCFR experiments are very well compatible with the DimuNeu prediction.On the other side, all dimuon data and all Chorus data as well as anti-neutrino data from the NuTeV show a mild tension where the χ 2 /pt ∼ 1.2.The neutrino data from the NuTeV collaboration are the most precise and show the largest tension with the DimuNeu analysis.As was stated in previous analyses and verified also in the course of this analysis, NuTeV neutrino data cannot be adequately described in this nPDF framework even if the data are fitted alone.
In the right panel of Fig. 6, we see that the predicted nuclear correction factor, coming from the global neutrino DimuNeu analysis, describes the data from NuTeV and CDHSW within their uncertainty.This   can be compared to the nuclear correction factor from the nCTEQ15WZSIHdeut analysis where the x-shape of the correction factor is completely different and cannot describe the neutrino data at all.We also observe in the left panel of Fig. 6 that the inverse is true for the neutral current data where the nuclear correction factor which describes the neutrino data fails to describe the aforementioned data.This is true almost for any x but the largest deviation can be seen for x < 0.07.Even for mid-x where the shape of the DimuNeu nuclear correction factor would be consistent with the data, it consistently undershoots all data.Here the situation is reversed and the nuclear correction factor from nCTEQ15WZSIHdeut describes the data well.This apparent inconsistency of the nuclear correction factor determined from neutrino data with the rest of the neutral current data is what prompted the series of studies starting with [22].In Fig. 7 we show that not all observables disagree.In the case of the W ± production at the LHC we see a nice agreement between the results from the nCTEQ15WZSIHdeut and DimuNeu analyses.This should come as no surprise given that the W ± production is quite sensitive to the  FIG. 6.The structure function ratio predictions from DimuNeu and nCTEQ15WZSIHdeut fits.The grey bands on the left and on the right highlight the regions without any data points passing the kinematic cuts.gluon PDF 4 which remains fixed and is the same in both analyses.Above, we have verified that the prediction from the DimuNeu analysis correctly describes the experimental data on the F CC 2 structure function by comparing the nuclear correction factor R[F CC  2 ].Given that we have not used the structure function data in our analysis, it is also instructive to see how well the cross-section data are being described analogously to the results and discussion of Fig. 4. For that purpose we return to the weighted average introduced in Sec.III B and in Fig. 8 to check how well the DimuNeu analysis fits the data.Even though all data considered in Fig. 8 correspond to the same observable, the result of the averaging procedure depends on which data set is used in the averaging as different experiments have different ranges in Q 2 which are being averaged over.Therefore, 4 Actually, in case of a nPDF fit without jet data the W/Z LHC data provide the most stringent constraints for the gluon.
separate theoretical predictions for the weighted average for each experiment with the corresponding uncertainties are shown.In constructing the theoretical prediction for the weighted average we have replaced R σ i and ∆R σ i in Eqs. ( 6) and ( 7) by the predicted central value and the theoretical uncertainty stemming from the PDF uncertainty, respectively.We have retained the weights w i calculated from the corresponding experimental data to ensure the same weighing procedure is used for both data and theory predictions.We see that in general the theoretical prediction from the DimuNeu analysis fits the cross-section data as well as it did the structure function data.There is a good agreement between the data and the DimuNeu prediction for all experiments in the intermediate Bjorken-x region.In the large-x region, the DimuNeu result is a compromise between the diverging experimental data where the NuTeV measurement starkly differs from the others.For small Bjorken x the fit is also a compromise given that the CDHSW, CCFR and NuTeV show no distinct shadowing in this region whereas the CHORUS data display a shadowing behavior similar to the neutral current DIS x 0.8 0.9

data.
Given the noticeable difference between the neutrino data taken on iron and the data taken on lead in Fig. 8, one might conclude at first glance that these data are incompatible with each other.However, we see that the DimuNeu analysis can describe both neutrino data on iron and on lead quite successfully within one unified nPDF framework.To investigate the matter a little further, we have performed two separate fits which we label "DimuNeuIron" and "ChorusW".Both fits use only 14 free parameters and compared to the free parameters of the nCTEQ15WZSIHdeut fit listed in Sec.II B all parameters b x i corresponding to the Adependence were held fixed.The reason for fixing these parameters is that both fits include data taken only on one nucleus.In the case of the DimuNeuIron analysis, only neutrino data from CDHSW, CCFR and NuTeV taken on iron were included and in the case of the ChorusW analysis only Chorus neutrino data and LHC data on W -boson production both taken on lead were used.In Fig. 9 we compare the predictions for the charged-current structure function ratios for iron (red) and for lead (blue) from these specialized fits (dashed lines) with the predictions from the global DimuNeu neutrino analysis (solid lines).We see that in general the predictions from the specialized fits agree well with the ones from the global DimuNeu analysis with the sole exception of the large-x region where the precise NuTeV data dominate the global analysis.
The difference in the nuclear correction factor for iron and for lead can come from two sources.The main effect usually comes from the different proton and neutron content of the iron and the lead atoms.The large excess of neutrons in a lead nucleus leads to noticeable differences in predicted observables even though the underlying effective bound proton and bound neutron PDFs are the same as for other elements.The second possible source for the difference is the dependence of the underlying bound nucleon PDFs on the atomic number A. The second effect is typically subleading.We can see the impact of the large neutron excess if we compare the predictions for lead in Fig. 8, where in accordance with the experimental data it was assumed that A = 208 and Z = 82, with the predictions shown in Fig. 9, where A = 208 and Z = 104 were used given that the structure function data from Chorus are isoscalar corrected.We can therefore conclude that the neutrino data from all experiments irrespective if they are taken on iron or lead show similar behaviour for all but large x > 0.5.

IV. NEUTRINO DATA COMPATIBILITY
In this section we will introduce a combined global nuclear PDF analysis including all data from the reference nCTEQ15WZSIHdeut fit (see Sec. II) and all neutrino data discussed in Sec.III.Extending an existing PDF analysis by including new data is a standard and frequent occurrence.Usually one includes new data in a PDF analysis in order to improve on the precision or on the x-Q 2 coverage of previously used data or to constrain PDFs of partons which were previously left unconstrained.In order for the new data to provide all that, it has to be possible to consistently describe them in the underlying theoretical framework based on the factorisation theorem, perturbative QCD and on the x-parametrization of the PDFs at the input scale.Schematically, if the new data cannot be consistently described in a combined analysis, it can mean one of two things.Either the theoretical framework needs to be extended for example by including small-x resummation effects or the target mass corrections or there was a problem with the data acquisition e.g. the experimental errors were underestimated.
Based on the preliminary analysis we have performed on the neutrino deep inelastic scattering data in the previous section, we expect possible large tensions between the neutrino data and the rest of the nuclear scattering data.Therefore, we will investigate the compatibility of the neutrino DIS data with the bulk of the nuclear scattering data in detail.We will take a closer look at the compatibility of the results of each neutrino DIS experiment separately.We will also look into the possibility that all neutrino DIS data are showing significant tensions, which, in one interpretation, may indicate incompleteness in the theoretical framework used to describe neutrino scattering in the nPDF analysis.

A. Compatibility Criteria
Before we dive into the details of the compatibility discussion, we need to clearly specify the criteria for compatibility which we will be using.In general, we will be discussing the compatibility of two data sets S and S in a global fit which includes both of the sets Z ≡ S ∪ S. In our case, the set S will always be the set of data used in the reference fit nCTEQ15WZSIHdeut and the set S will be some subset of the newly considered neutrino data.In what follows, we will be using three different criteria.
∆χ 2 S -compatibility This first criterion for comparison of the compatibility of two data sets S and S uses the χ 2 of the global analyses of the data sets S and Z.We use the χ 2 to assess whether the nPDFs extracted from the fit to the combined data set Z are within the error bands of the nPDFs from a fit to the baseline data set S. It can be shown that in the Hessian error formalism, this happens if and only if the increase of the χ 2 of S before and after including S is less then the tolerance ∆χ 2 S , hence the name of this criterion.To apply this criterion in our case, we have to define a proper tolerance ∆χ 2 S of the global reference fit to the data S which in our case is the analysis nCTEQ15WZSIHdeut discussed in Sec.II.In the nCTEQ15 analysis, we have used ∆χ 2 = 35 with N = 740 data points.However, the nCTEQ15WZSIHdeut analysis contains significantly more data N = 940 so an adjustment of ∆χ 2 S is required.We will make use of the χ 2 -distribution for N degrees of freedom ξ 50 serves as an estimate of the mean of the χ 2distribution and we expect the χ 2 of a good fit to be close to ξ 50 .In the case of nCTEQ15WZSIHdeut analysis where χ 2 0 = 735 < ξ 50 = 939, the fit was better than expected.Due to the large discrepancy between χ 2 0 and ξ 50 = 939, we have decided to rescale all percentiles by a factor γ S = χ 2 0 /ξ 50 .The new rescaled 90% percentile then becomes χ 2 90 = γ S ξ 90 = 779.We can finally define This is the tolerance we use to define the error PDFs for the nCTEQ15WZSIHdeut analysis.
Assessing compatibility using the ∆χ 2 S -criterion has one obvious drawback.If the reference analysis of data S contains a parameter (or a combination of parameters) which cannot be sufficiently constrained, the uncertainty connected to this parameter is often underestimated.This is due to the fact that in the Hessian approach the unconstrained parameters are connected to very small eigenvalues of the Hessian matrix and the diagonalization of a large matrix where the eigenvalues span multiple orders of magnitude is numerically unstable.If the global analysis of the extended data set Z ≡ S ∪ S constrains the previously unconstrained combination of parameters, the resulting PDF is often outside of the underestimated error band of the previous analysis.In this case the criterion signals incompatibility even though there is none.Therefore no matter how useful this criterion is, we cannot rely just on this single criterion.
χ 2 S -compatibility The second criterion approaches the problem of compatibility slightly differently.Using this criterion, we asses if the data sets S and S are described acceptably well in a combined fit to Z ≡ S ∪ S, comparing the quality of the description of the data sets in the combined fit to the fits to the data sets alone.We will consider the data sets S and S are χ 2 S -compatible if both their χ 2 in a combined fit are within at most 90% percentile defined in Eq. ( 15) from their expected value.To account for the cases where a data set cannot be optimally described even in a fit only to the data set itself, we will define the rescaled percentile χ 2 90 = γ S ξ 90 exactly as we did in the case of the ∆χ 2 S -compatibility criterion above.
Similar to the first criterion, using the χ 2 Scompatibility criterion also has its issues.In order (see Eq. ( 14)) is heavily dependent on the number of data points N of the experiment.Therefore, instead of the χ 2 -distribution P (χ 2 , N ) we use a variable S(χ 2 , N ) which is no longer strongly sensitive to the number of data points.Moreover, the variable S(N ) is distributed according to the normal distribution with zero mean and unit variance [16].We can evaluate for each experiment using the number of data points N = N E and χ 2 = χ 2 E and check if the variable for all experiments is distributed according to the normal distribution with the expected mean and variance.This happens if the χ 2 values of all experiments involved in the global analysis are distributed according to the corresponding χ 2 -distributions.On top of checking if S E for the totality of experiments is distributed as expected, we can also identify experiments which are not compatible with this distribution and also quantify to what degree using the standard confidence levels of the normal distribution.

B. Global analysis with neutrino data
We will start our analysis of the compatibility of the neutrino DIS data with the rest of the nuclear scattering data used so far in the nCTEQ analyses by considering a global analysis which adds all available neutrino data to the rest of the nCTEQ data mentioned in Sec.II B. The fit BaseDimuNeu contains all the data from the reference nCTEQ15WZSIHdeut analysis and all inclusive (anti-)neutrino DIS data from the CDHSW, Chorus, CCFR and NuTeV experiments as well as semi-inclusive di-muon data from CCFR and NuTeV.We have to emphasize that there is a disparity between the number of data present in the original nCTEQ15WZSIHdeut analysis (N = 940) and the number of the new neutrino DIS data added (N = 5689).Therefore, the neutrino data will dominate the global analysis and we expect that if there is any tension, it can be seen in a different description of the original data of the nCTEQ15WZSIHdeut analysis.
The global analysis BaseDimuNeu uses the same framework discussed in Sec.II with the same 27 free parameters to determine nuclear PDFs by fitting 6629 data points.We obtain χ 2 = 7532 or alternatively χ 2 /pt = 1.14.Given that all neutrino data could be described with χ 2 /pt = 1.12 and we have added nCTEQ15WZSIHdeut data to the analysis which on its own was described with χ 2 /pt = 0.78, the result of the global analysis can be considered as the first signal that there may be some tension among the data within the analysis.
Specifically, when we compare the description of the subset of the data common to both nCTEQ15WZSIHdeut and BaseDimuNeu analyses, we notice a distinct rise from χ 2 = 735 to χ 2 = 866.This is an increase of 131 which is almost three times larger than the ∆χ 2 =45 which was used to generate the error PDFs of the nCTEQ15WZSIHdeut result.
This, according to the ∆χ 2 S compatibility criterion introduced above, signals that the newly added data are incompatible with the original data of the nCTEQ15WZSIHdeut analysis.All relevant χ 2 values are summarised in Tab.V.
As we have stated previously, violating the ∆χ 2 S compatibility criterion is also related to large differences in extracted PDFs.In Figs. 10 and 11 we show the nuclear PDFs for iron resulting from the BaseDimuNeu analysis and compare them to the nPDFs of the nCTEQ15WZSIHdeut fit including the uncertainties.
The comparison of both analyses is best seen in Fig. 11 where the ratio of BaseDimuNeu and nCTEQ15WZSIHdeut nPDFs is shown.We can clearly see that the up-and down-quark valence PDF distributions as well as the strange-quark nuclear PDF from the global analysis including all neutrino data lie outside or at the edge of the error band of the reference nCTEQ15WZSIHdeut analysis.To exclude the possibility that the newly added neutrino data just constrain previously unconstrained PDF parameters, we investigate also the χ 2 profiles varying the free parameters (see Fig. 12).In Fig. 12 we see that for many quark parameters the result of the BaseDimuNeu analysis is a compromise between the neutral current DIS data already present in the nCTEQ15WZSIHdeut analysis (labeled DIS in Fig. 12) and the newly added inclusive neutrino DIS data (labeled DISNEU).The final minima of the χ 2 function lie frequently between the minima preferred by the DIS subsets.The DIS and DISNEU subsets show clear sensitivity to the quark valence parameters a uv 1 , a uv 2 , a uv 4 , a uv 5 , a dv 1 , a dv 4 , a dv 5 based on their respective χ 2 growth profiles, but with widely-separated preferred values for those parameters.This is a clear sign for tensions between these subsets.On the other hand, the situation is slightly different in the case of the strange quark.There, the minima preferred by the same subsets are also distinct but we 12. Scans of the χ 2 function along the PDF parameter directions varying always one free parameter at a time while other parameters were left fixed at the global minimum of the BaseDimuNeu analysis.The breakdown into χ 2 for classes of experimental data is also shown.can also observe that the neutrino DIS data are much more sensitive to the strange quark parameter variations than the neutral current DIS data sets.This leads us to conclude that, in the case of the strange quark, the neutrino DIS is the data set providing the first strong constraint on the strange PDF parameters and hence the discrepancy is not a sign of tension here.However, there is a small caveat.The neutrino differential cross-section data prefer a different strange quark PDF compared to the di-muon neutrino data.Moreover, the di-muon data and the neutral current DIS data prefer a similar strange quark.This tension can be later seen in Tab.VI where the listed χ 2 /pt of the di-muon data signify that they are described much worse than in the neutrino only DimuNeu analysis.
The difference between the extracted PDFs from the BaseDimuNeu and nCTEQ15WZSIHdeut analyses translates into different predictions for observables such as the ratio of structure functions F 2 and F 3 shown in Figs. 13 and 14 respectively.Here a similar interpretation is possible where we can clearly see that the results of the BaseDimuNeu analysis are a compromise between the nCTEQ15WZSIHdeut results and the results of the DimuNeu analysis which included only the neutrino data.The compromise predictions of the BaseDimuNeu analysis for the neutral-current nuclear ratio are compatible up to 1-σ with the nCTEQ15WZSIHdeut prediction given that the central value lies within the error band of the nCTEQ15WZSIHdeut analysis.In the case of the other observables, the tension is larger.In the case of the charged-current nuclear ratio the results of the BaseDimuNeu are incompatible with the nCTEQ15WZSIHdeut result at x ∼ 0.025 as the difference between the central predictions of the two analyses is larger than the error estimate on either analysis.
The same is true if we would compare the predictions from the BaseDimuNeu and from the DimuNeu analyses.The case of the ratio of the structure function F 3 is a little different.First of all, there was almost no experimental information directly on the structure function F NC 3 from the neutral-current DIS data.Furthermore, the data on the charged-current structure function F CC 3 have larger errors compared to the structure function F 2 .Even with larger errors, the F 3 data from NuTeV experiment (see Fig. 14) are not described particularly well by any of the analyses.
Moreover, similar to the case of the structure function F 2 the predictions of the nCTEQ15WZSIHdeut and the DimuNeu analyses are incompatible with each other.This time the largest tension is found in the interval 0.1 < x < 0.4.The central predictions of the global analysis BaseDimuNeu are in turn outside of the error band of the nCTEQ15WZSIHdeut analysis for 0.15 < x < 0.3.We conclude that the tension which can be observed at the level of extracted PDFs in Figs. 10 and 11 translates also to the ratios of the charged-current structure functions.
To reach a conclusive picture of the compatibility of neutrino DIS data with the remaining scattering data, we will use the other two criteria introduced in the previous section.The χ 2 of the neutrino and the rest of scattering data subsets in the combined analysis are χ 2 = 6666 and χ 2 = 866, respectively (see Tab. V).Using the rescaled percentiles as defined previously, we see that the description of both subsets of data is outside of the 90% percentile (and even outside of the 99% percentile in the case of nCTEQ15WZSIHdeut data), making the data sets incompatible according to the χ 2 S -compatibility criterion.
Lastly, we will look into the details of how well all experiments are described in the combined global analysis with all neutrino data.
In contrast to using the rescaled percentile to account for imperfect description of data, we will use the distribution of the S(χ 2 , N ) variable for all the experiments in the combined analysis.Considering the whole distribution allows for the possibility that some experiments in the global analysis are not described well leading to S E > 0 and that some are over fitted (S E < 0).Before we investigate the S E distribution of the combined analysis, we will review the same distribution for the reference nCTEQ15WZSIHdeut analysis which is shown in the left panel of Fig. 15.
After analyzing the distribution and determining the mean (µ = −0.74)and the standard deviation (σ = 1.12), we can see that the nPDF framework with 27 free parameters is describing the data too well on average but the spread is still compatible with the ideal distribution of the S(χ 2 , N ) variable.The distribution of S E in the case of the BaseDimuNeu analysis is shown in the middle panel of Fig. 15 and from the characteristics of the distribution, it is clear that on average experiments are still described well (µ = 0.08).However, this time the standard deviation σ = 2.54 signifies that there are more outlier experiments.Our interest is twofold.First, we would like to compare the description of the experiments contained in nCTEQ15WZSIHdeut and the subset of the same experiments in the combined analysis BaseDimuNeu.We show the distribution of the nCTEQ15WZSIHdeut experiments in the BaseDimuNeu analysis in the right panel of Fig. 15.Comparing how these two analyses describe the same set of experiments, clearly points to the BaseDimuNeu analysis being a compromise given that the description of this subset of experimental data is worse than in the reference analysis (µ = −0.26 and  NuTeV cross-section data, Chorus cross-section data and above all the di-muon data in the compromise fit of the BaseDimuNeu analysis is much worse than in the reference only neutrino DimuNeu analysis.Moreover, if one examines the shifts in the description of the experiments in the reference nCTEQ15WZSIHdeut analysis seen in Fig. 15 more closely, we can discover large shifts in χ 2 /pt or alternatively in the S E variable especially in precise DIS experiments (for details see Tab.VII).These facts all together lead us to conclude that the inconsistency signalled by the other criteria is justified and there is indeed a large tension between the neutrino data and the rest of the scattering data.
The crucial question which we will address in the final part of this paper is if there is a way to include the neutrino DIS data in a combined analysis while at the same time avoiding large tensions and incompatibilities.

V. CONSISTENT GLOBAL NPDF ANALYSIS WITH NEUTRINO DATA
In the previous section, we have shown that incorporating neutrino data into the nCTEQ framework can produce significant tensions among key data sets.Moreover, we have observed in Sec.III that these include tensions among different neutrino scattering measurements, most notably among the ones taken on iron from the CDHSW, CCFR and NuTeV collaborations and those taken on lead from the Chorus collaboration. 5o complicate matters even more, the neutrino inclusive DIS data and the neutrino di-muon data each prefer a different strange quark PDF, leading to substantial tensions as well.The goal of this section is to explore ways to include neutrino data in a global analysis so that these large tensions can be avoided or mitigated.
Before we consider a global analysis, we will introduce a series of fits where on top of all data from the reference nCTEQ15WZSIHdeut analysis, we include neutrino and anti-neutrino data from one single experiment.This way we can explore tensions of neutrino data from every single neutrino experiment with the reference analysis without considering any tensions among the neutrino data themselves.We show the statistical results of four analyses (BaseChorus, BaseCDHSW, BaseCCFR and BaseNuTeV) in Tab.VIII.The results show that apart from the data from the Chorus experiment, adding the other neutrino experimental data causes tension with the neutral current scattering data.This should come as no surprise in light of the nuclear correction factors extracted from the neutrino and anti-neutrino data shown in Fig. 4, where only the nuclear correction factor from the Chorus neutrino and anti-neutrino data has a shape similar to the one preferred by the neutral current scattering data.Given the results shown in Fig. 4, we clearly expect the tensions for the other experiments to come from neutrino data in the low-x and/or in the high-x kinematic region.We will use this information in the following.
Aiming for a global analysis without large tensions among data sets, there are several possible approaches one can take: 1.If the tensions can be attributed to a specific kinematic region, they can be removed by imposing a kinematic cut on the neutrino data.
2. Large tensions can often be caused by very precise experimental data, and a compromise can be reached if it is believed that the estimate of the experimental errors is underestimated.In such a case, the errors might be artificially enlarged.
3. The last option is to identify experiments which are still consistent with the bulk of the original data and include only those in our analysis.
We will investigate all of these approaches in the following.
A. Neutrino DIS data with x > 0.1 The large tensions and incompatibilities observed in the previous section were not completely surprising considering the ratio we have extracted from the crosssection data in Sec.III B and which as shown in Fig. 4, shows a markedly different shape of the nuclear correction factor, especially in the small-x and in the very large-x regions.Given that our conservative kinematic cuts on Q 2 and W 2 are already effectively restricting the large x region, the only way how we can resolve the tension using a kinematic cut is to exclude the low-x neutrino data.
Using arbitrary cuts to remove the data which cause the largest tensions in each experiment is not in line with the philosophy of a global analysis, because it introduces a bias which such an analysis tries to avoid.One possible motivation for using a cut to remove data could be signs that the theoretical description of the data in a specific region is inadequate.In this section, we will assume that the large tensions in the low-x region may be due to e.g. a different mechanism for nuclear shadowing in charged current DIS [62] which is not properly included in our theoretical framework.A different reason one could have to justify a kinematic cut is an internal tension between all neutrino data in this region.We can see in Fig. 4 that there is indeed such a tension in the low-x region, especially between the NuTeV and CCFR data on one side and Chorus data on the other.
Citing any of these reasons, we will employ an arbitrary constraint, x > 0.1, which the charged current DIS data have to fulfill.This applies to all inclusive DIS and dimuon data.
To show the impact of such a cut, we have performed an analysis similar to the global BaseDimuNeu analysis requiring that all neutrino data satisfy the constraint, x > 0.1.
This analysis, called in the following BaseDimuNeuX, uses the same number of free parameters and, similarly to the previous analysis, also fits the normalisation of all neutrino experiments.
The kinematic cut removes 1045 data points from the low-x region of neutrino scattering data.The result of this analysis has χ 2 /pt = 1.04.Further details and the breakdown of the χ 2 for the usual data subsets are listed in Tab.VIII.
Analyzing the statistical properties, we see that with ∆χ 2 S = 46, which is approximately within the 91-percentile, the analysis BaseDimuNeuX is barely consistent with the original data in the nCTEQ15WZSIHdeut analysis.A closer look at Fig. 16 reveals that most experiments are fitted well except only a few outliers.The tensions are experienced by the NuTeV neutrino cross-section data (S E = 9.72 largest not shown) and by the NuTeV anti-neutrino data (S E = 3.37).Without these data the S E distribution would be very similar to the one of the reference analysis shown in Fig. 15.
In Figs. 17 and 18 we compare the extracted nuclear PDFs to the ones of nCTEQ15WZSIHdeut.If we first focus on the central values of the nuclear PDFs extracted in the BaseDimuNeuX analysis, we observe that except for the strange quark PDF, the central values are within the error bands of the reference analysis.This nicely highlights the usefulness of the ∆χ 2 -criterion.Comparing the results with those of the BaseDimuNeu analysis shown in Fig. 11, we see that the shapes of the central values are very similar.This indicates that the tensions from the original global analysis are not completely removed but just reduced in size.The unexpected part of the result which can be seen in Fig. 17 is the uncertainty band of the strange quark and gluon PDFs.The large uncertainty of the strange quark is a result of two competing preferences for the strange quark PDF from the di-muon and the neutrino inclusive cross-section data.As a result the uncertainty is enlarged to account for this tension.The reduced gluon PDF uncertainty is due to adding a large number of precise DIS data, constraining the gluon via the NLO sensitivity of the DIS process to the gluon PDF.
The predictions for the nuclear correction factors from the neutral and charged current DIS are shown in Fig. 22 and compared with those of the reference analysis.There we can see that, as expected, the much different behavior of the theoretical prediction in the low-x region, which was present in Fig. 13 for the BaseDimuNeu analysis, is largely gone and the prediction has a larger uncertainty band for the charged current nuclear correction factor.Moreover, excluding neutrino data with x < 0.1 from the analysis significantly effects the prediction of the nuclear correction factor also in other regions in x.In Fig. 22 we see that the structure function data from NuTeV and CDHSW are not correctly described even in the intermediate x region and only the large x behavior is driven by the NuTeV data and remains very different from the predictions of the reference analysis.
Overall, we see that employing the cut x > 0.1 to all neutrino data reduces the tensions just enough for this fit to be considered consistent.However, some problems still remain.The tension in the previously well determined valence quark PDFs is still present and the NuTeV crosssection data is still badly described.Moreover, all this has been achieved after removing the small-x and large-x data where the tensions are the largest.It needs to be stressed once more that this analysis can be considered the final result only if a plausible explanation for the additional kinematic cut is put forward.All panels show the fitted Gaussian distribution to the actual SE distribution (blue) compared to the ideal Gaussian SE distribution with µ = 0 and σ = 1 (red).Note that in the case of the BaseDimuNeuX analysis we do not show a bin with SE=9.72 which corresponds to the NuTeV neutrino data.

B. NuTeV with uncorrelated systematic errors
The second possible approach to lessen the tensions we consider is to enlarge the errors of the experimental data causing the tension.An equivalent to enlarging the errors of all data of a data set is to introduce a weight for this data set in the calculation of the χ 2 -function.We have investigated this option in our previous analysis [26] and found no acceptable way to include the neutrino DIS data in a global analysis.
In a similar spirit, previous analyses [26,28] enlarged the errors of the NuTeV cross-section data by not considering the correlated systematic errors.Let us therefore explore the effect of neglecting these correlations on the combined analysis.First, we have performed a fit with the data from the nCTEQ15WZSIHdeut analysis and only from the NuTeV experiment using uncorrelated systematic errors.The analysis BaseNuTeVU clearly shows that with uncorrelated systematic errors the framework we use to fit the experimental data can, for the first time, describe the NuTeV data well with χ 2 /pt=0.93.
Moreover, comparing to the BaseNuTeV analysis which used correlated systematic errors, we see that the tension with the neutral current data is reduced but still present (for details see Tab.VIII).This shows that the inconsistencies cannot be attributed solely to the use of correlated systematic errors.
For completeness, we have also performed a global analysis much like BaseDimuNeu but without correlations in the case of the NuTeV data (called BaseDimuNeuU).Here a similar picture emerges.The neutrino data are described much better (χ 2 /pt=0.98),but the tension with the neutral current data is unchanged.Some details of the tensions are again visible in the S E -distribution shown in Fig. 16, where the standard deviation of the distribution is much larger than unity (σ = 1.89).Large S E contributions can be traced back to the neutrino di-muon data from both CCFR (S E =4.77) and NuTeV (S E =3. 19) which as we have seen before prefer a different strange quark PDF compared to the inclusive neutrino data.The tensions with the neutral current DIS data have also not improved but rather got worse compared to the BaseDimuNeu analysis (see Tab. VII).The largest S E  contributions still come from the Ca/D and C/D data from the NMC collaboration (S E =3.91 and S E =2.45 respectively).Therefore, we conclude that the use of correlated systematic errors for the NuTeV data has no effect on the compatibility of the neutrino data with the rest of the scattering data and neglecting the correlations does not reduce the tensions, even though the neutrino data seem to be described well overall.

VI. GLOBAL ANALYSIS WITH CHORUS AND DI-MUON DATA
As we have shown in Sec.IV, the global analysis of all available data where also all neutrino data are included leads to large tensions.Furthermore, we have shown that these cannot be sufficiently removed by introducing a kinematic cut or by neglecting the correlations of the systematic errors of the neutrino experiment where the tensions are the largest.One option which we have not yet explored is to try to identify a subset of the neutrino data which shows no or little tension.Based on what we have observed in previous analyses, we will add all di-muon and both Chorus neutrino and anti-neutrino scattering data to our global analysis and disregard all other (anti-)neutrino data.We will refer to this global analysis as BaseDimuChorus.The statistical results of this analysis are also given in Tab.VIII and the total χ 2 /pt = 0.97.As can be seen from the details in Tab.VIII, in this combined analysis, all data from the reference nCTEQ15WZSIHdeut analysis as well as all neutrino data are described well.We have performed a dedicated analysis of only di-muon and Chorus data (DimuChorus analysis) so that we can assess how well FIG.21.Scans of the χ 2 function along the PDF parameter directions varying always one free parameter at a time while other parameters were left fixed at the global minimum of the BaseDimuChorus analysis.The breakdown into χ 2 for classes of experimental data is also shown.We note that in this case "DISNEU" refers to the Chorus data which is the only inclusive neutrino data used in this fit.these data are described in the combined analysis.Using the rescaled percentiles defined above, we see that the descriptions of both the nCTEQ15WZSIHdeut data and di-muon and Chorus data are both within the 90% percentile of the χ 2 -distribution.In Fig. 16 we also show the S E distribution and clearly see that on average the data are over fitted (µ = -0.54)and that the standard deviation of the distribution is larger than the one for nCTEQ15WZSIHdeut (σ = 1.28).This is due to the new neutrino data given that the neutrino cross-section data from Chorus are fitted to χ 2 /pt = 1.27 (S E = 3.61) and also the di-muon data from CCFR to χ 2 /pt = 1.68 (S E = 2.70).However, we can see that these data were not described much better in any other analysis and given that all other criteria do not signal inconsistencies, we can look at these results as statistical fluctuations.
In Figs.19 and 20 we show the extracted nuclear PDFs from this analysis and compare them to those extracted from the reference nCTEQ15WZSIHdeut analysis.We can see that the central values are almost identical for all but the strange quark PDFs, where the addition of new neutrino data leads to a shift in the central value of the strange quark PDF.Moreover, the neutrino data are also more sensitive to the strange quark, which is reflected in the noticeably reduced uncertainty.The effect of better flavor separation of the quark PDFs thanks to the addition of the (anti-)neutrino cross-section data from Chorus can be also observed in the reduced uncertainties of the valence quark PDFs.From the scans of the χ 2 function along the free parameters and the breakdown into separate contributions to the global χ 2 stemming from different experiment classes shown in Fig. 21 we can read off the details, which subset of experiments is responsible for constraining specific parameters.We can infer from Fig. 21 that the valence quark and the antiquark parameters are mainly constrained by the neutral current DIS experiments while the gluon parameters are constrained by the vector boson production processes at the LHC and from the single inclusive hadron production processes.Most importantly for this analysis, we see that the strange quark parameters are constrained by the dimuon data and also from the Chorus inclusive data alike.Comparison between the data from the NOMAD experiment [60] and our theory predictions using our fitted PDFs for the ratio of the di-muon production and the total charged current DIS cross-section.
and 23.The predictions from the BaseDimuChorus and the reference nCTEQ15WZSIHdeut analyses are almost identical and we can observe a reduction in the uncertainties after adding the Chorus and di-muon data.
In the case of the charged current nuclear correction factor for the structure function F 2 , we see that the theoretical prediction from the BaseDimuChorus analysis does not describe the structure function data from NuTeV or CDHSW well.This is to be expected as we have omitted the corresponding NuTeV, CCFR and CDHSW cross-section data from the fit as they were the source of inconsistencies.In the case of the structure function F 3 , neither the predictions from the BaseDimuChorus or from the BaseDimuNeuX analysis can describe the F 3 data from NuTeV well.We should note that even though the normalization of the crosssection data from NuTeV (and also from the other collaborations) was allowed to vary as a part of the fitting procedure, no shift was applied to the structure function data shown in Figs.22 and 23.Shifting the NuTeV data by the normalization of 3.6% determined in the BaseDimuNeuX analysis would improve the tensions between the data and the theoretical prediction for both structure functions from this analysis.
Finally, in Fig. 24 we also compare the theoretical predictions for the ratio of di-muon and charged current total cross-sections measured by the NOMAD collaboration as a function of the incoming neutrino energy.
We see that the prediction from the BaseDimuChorus analysis where the strange quark PDF is largely determined by the CCFR and NuTeV di-muon data, describes the NOMAD di-muon data very well for all incoming neutrino energies.We also observe that the uncertainty on the prediction is much larger than the experimental errors indicating that including this data in our future analysis can lead to a substantially more precise extraction of the strange quark PDF.Given the large uncertainties on all theoretical predictions shown in Fig. 24, we can consider the NOMAD data to be described well enough even by the nCTEQ15WZSIHdeut and BaseDimuNeuX analyses.This is an indication of a realistic estimation of the uncertainty of the strange quark PDF in these analyses.
Out of all possible approaches listed at the beginning of Sec.V, only the last one presented here led to a combined analysis compatible with the reference analysis nCTEQ15WZSIHdeut.Moreover, the neutrino data included in this analysis provided a much improved description of the strange quark PDF.

VII. CONCLUSIONS AND OUTLOOK
The aim of this analysis was to take a second look at the (anti-)neutrino deep inelastic scattering data and see if, after all the developments of recent years, a conclusion different to the one presented in our analysis [26] can be reached.As our previous study of the neutrino data predates the nCTEQ15 analysis and any updates thereafter, one could have imagined a shift in the outcome.Moreover, compared to our previous analysis, we were now in a position to use different tools to analyse the compatibility of the neutrino DIS data.We have also added other neutrino data sets to make the current analysis much more comprehensive.
The analysis presented in this paper starts by collecting all relevant updates to the nCTEQ15 analysis to form the reference fit to use in comparing the compatibility of neutrino data.This is then followed by reviewing the neutrino data and presenting the extraction of effective nuclear correction factors from the crosssection data.On top of that, a fit to all neutrino data is performed and the results are compared with the reference analysis.
In the main part of this analysis in Sec.IV we have performed a global fit (BaseDimuNeu) where we have added all neutrino data to the extended nCTEQ15 analysis.
We have observed large tensions in the previously well determined valence quark PDFs and, even in the strange quark PDF determination, tension among the neutrino data is visible.Therefore, the first important conclusion of this analysis is that, due to the large tensions, the bulk of neutrino data is considered incompatible with the data of the baseline analysis or even among each other.
In an effort to recover at least a subset of neutrino data to be used in a global analysis, we have proposed three strategies to alleviate the tensions between the neutrino DIS data and all the data in the reference analysis.We have analyzed the possibility of neglecting the correlations in the systematic errors of the NuTeV experiment, which are responsible for a substantial part of the tensions in the neutrino data itself.This yielded a much better description of the neutrino data, but the tensions with the original data of the nCTEQ15WZSIHdeut analysis remained.
Since the neutrino data introducing tension at high Bjorken x had already been removed by initial global kinematic cuts, the next possibility we investigated was the introduction of an arbitrary kinematic cut to remove the remaining problematic neutrino data in the region of low Bjorken x (BaseDimuNeuX).As expected after the removal of this data, which causes most of the remaining tension, the description of all the data improved and this can in principle be considered a way to go.However, for this possibility to be viable, a reason for introducing such a cut has to be provided.It was hypothesised (e.g., in [62]) that shadowing in neutrino scattering on nuclei works differently than in the neutralcurrent DIS, which is the cornerstone of the reference PDF analysis.If this were indeed the case, one would have to modify the theoretical predictions for neutrino scattering.Alternatively, before doing so a cut might be introduced to remove data which do not have a proper theoretical description.In such a case, the results of the BaseDimuNeuX analysis might be considered the final result of this study.
Given, however, that an alternative mechanism for shadowing in neutrino-nuclei interactions is not yet completely established and is merely hypothesised, we have put forward a different final result of our compatibility analysis.We have identified a subset of neutrino data which has no tension with the data in the reference analysis and so it can be safely included in a combined global analysis.Unfortunately, the majority of neutrino DIS cross-section data have been left unused in the process.Including just the scattering data from Chorus and the di-muon data from NuTeV and CCFR, we have performed the analysis called BaseDimuChorus.The result of including the new data is a much improved description of the strange quark PDF.
Even though we have found a way to include some neutrino data in our analysis in order to improve the determination of the strange quark PDF, the fact that the bulk of the DIS neutrino cross-section data is incompatible with the neutral current DIS data is established.Without new experimental data on neutrino-nucleus interactions in the DIS regime, there is no way to decide if this inconsistency is due to a different mechanism for the neutrino-nucleus interaction or simply a sign of problems in the acquisition of the current neutrino experimental data.The resolution could have come from the high-statistics NOMAD experiment but even after more than 20 years only the results of the di-muon analysis were publicly released so far.Unfortunately, after plans for a new neutrino scattering experiment were not followed-up on [63], no new highenergy neutrino scattering experiment is currently in planning.Nevertheless, there is potential to obtain new crucial data from novel ideas or experiments such as the proposed Forward Physics Facility [64] at the LHC or from precise measurements of charged current DIS processes at the future Electron-Ion-Collider [65,66].
The normalization uncertainty is a scale uncertainty that affects both the central data and its uncertainties.The conventional way which is still often used to include the normalization uncertainty in a χ 2 fitting procedure is by constructing a covariance matrix in the following way: where D i is the i-th data point, σ i , σiα and σnorm are the statistical uncertainty, systematic uncertainty from α-th source, and the normalization uncertainty.Using Eq. (B1) during χ 2 fitting can lead to d' Agostini bias [67] which causes the fitted theory to be much lower than expected.Furthermore, the bias becomes worse as the number of data points increases [67,68].Using Sherman-Morrison formula [69] to write the inverse of C D : it is straightforward to prove that using the covariance matrix (B1) is equivalent to using the following χ 2 function : Here, T (a) is the theory prediction and both the theory parameters a and the normalization one r are to be fitted to the data.The equivalence means that Hence, using (B4) will also lead to same d' Agostini bias.
To illustrate how the bias could really affect fits with high statistic neutrino data such as NuTeV and Chorus, we have performed fits using Eq.(B4) with the individual NuTeV and Chorus data.During the fit, we open 12 parameters and fixing the gluon parameters to the same values as in nCTEQ15 analysis [8].We obain χ 2 /N = 0.86 and χ 2 /N = 0.95 for the NuTeV and Chorus fit respectively.We plot the weighted average of data/theory in top panel of Fig. 25.The figure shows that the theory is severely below the data.Even though the normalisation uncertainties in both experiments are the same (2.1%), the bias in the NuTeV fit is more severe than in the Chorus fit.The reason for this is the much larger number of data points in NuTeV (2136 points) than in Chorus (824 points).The weighted average of the data/theory from fits with NuTeV and Chorus data where the normalization uncertainties are treating using (B4) (top panel) and using the method adopted in this work (B14) (bottom panel).
To avoid d' Agostini bias, several prescriptions exist in the literature.The first method is to use the following χ 2 function [67] : (B6) This method requires to fit the normalization fluctuation, r, directly to the data.The main drawback of this   approach is that the number of normalization parameters can become large, and in case there are many data sets in the global fit, even comparable to the number of PDF parameters.This causes the fit to be prone to numerical problems, such as saddle point or local minimum trap.The larger number of parameters also means the computing cost will increase.
It is worth mentioning that as the normalization fluctuation parameters are basically nuisance parameters fitted to the data, thus their uncertainties must be taken into account when estimating the uncertainties of the fitted PDFs.An easy way for an error estimation with nuisance parameters by freezing them to the minimum point of the χ 2 will result in an underestimation of the true uncertainty.A consistent way to include the uncertainty of the nuisance parameters is given by profile likelihood method.For a χ 2 (aµ, r i ) function, where aµ, µ = 1, ..., N denotes the parameters of interest (PDF parameters) and r i , i = 1, ..., M are the nuisance parameters, one defines the 'profile' χ 2 function as which is a function of PDF parameters only.The Hessian-based error PDF determination can be done using this profile χ 2 p .However, the computation of χ 2 p is expensive as there is no closed-form solution for χ 2 p , hence this method is impractical.An alternative, but equivalent, method is to use the full χ 2 (a, r), but in the Hessian error estimation, the inverse of the N ×N effective Hessian matrix is given by N ×N -submatrix of the inverse of the full (N + M ) × (N + M ) Hessian matrix [70].To prove this, let H p µν be the second derivative of χ 2 p (a) with respect to the theory parameters aµ and aν , where µ, ν = 1, ..., N .Let H µi , Hµν , and H ij be the second deriative of χ 2 with respect to aµ and r i , aµ and r i and r j .Here, i = 1, ..., M .By implicit differentiation, the Hessian H p µν can be written as where r(a) = arg minr χ 2 (a, r).The derivative ∂ r/∂aν evaluated at any a is hard to be calculated as the explicit function r(a) is unknown.However, for a = â = arg mina χ 2 p (a), we can express the derivative as where Hr is an M × M matrix whose components are the same as H ij .Note that all the Hessian matrices on the RHS are evaluated at the minimum â.Inserting this to (B8), we obtain For any block matrix : with A, B, C, D are N × N , N × M , M × N , and M × M matrices, the first (upper left) N × N component of P −1 is given by (A − BD −1 C) −1 .Therefore, one immediately see that as stated before.
An alternative method to include normalization uncertainties in a global fit is to use t 0 -method as explained in detail in [71].This method basically set the covariance matrix: where T 0i is the theory prediction from previous iteration of the fit and C ij is the original covariance matrix without normalization uncertainties.This method eliminates the nuisance parameters from the χ 2 function and hence their uncertainties are automatically included.As the normalization is eliminated, it is not clear how one can obtain the estimated normalization parameters.Knowing the estimated normalization is important for data-theory plotting purpose and for sanity check if its value is close to unity.In this work, in order to treat normalization uncertainties, we adopt the following prescription χ 2 r (a, r) = i,j We will see that this method is equivalent to the t-method discussed in [71].(B14) can be rewritten as where It is clear now that the fitted normalization is given by : Furthermore, the χ 2 at r is given by : and we have used the following formula for the inverse of C T : This equation follows from Sherman-Morrison formula [69].Thus, the fitting normalization uncertainty in this way is equivalent to using an effective covariance matrix C T .The advantage of using this approach is that the nuisance parameters are now completely eliminated and the Hessian errors automatically take into account the uncertainty of the nuisance parameters into the estimation of error PDFs.As the difference between formula (B14) and (B6) essentially comes from the penalty term, then this method is equivalent to (B6) if the optimal normalization parameter r is not far from unity, which is usually the case.
It is trivial to generalize this method to a case where there are more than one data sets that share the same normalization.In such case, the fitted normalization formula (B19) still hold, but A, B and C are modified as where s denotes the data set s and the sum is done over all data sets that share the same normalization.
In order to contrast the fit results obtained with formula (B4) leading to the d'Agostini bias, we performed analogical fits using formula (B14).In the bottom panel of Fig. 25, we show the weighted average of the data/theory for fits with the individual NuTeV and Chorus data, where now the (B14) is used.We obtain χ 2 /N = 1.36 and χ 2 /N = 1.07 for the NuTeV and Chorus fits respectively.We can see that for both NuTeV and Chorus fits the ratio becomes much closer to unity, as one could expected having in mind that the normalization uncertainty for these data is ∼ 2%.The relatively high data/theory values for the Chorus fit at x > 0.4 is related to large systematic uncertainties (hence large systematic theory shifts).Comparing the upper and lower panels of Fig. 25 the difference in the results of the NuTeV fit is especially striking.It also confirms that d'Agostini bias is getting larger with the number of data points.

FIG. 3 .
FIG. 3.The ratio of nuclear parton distribution functions of the nCTEQ15WZSIH and nCTEQ15WZSIHdeut analyses with respect to the nCTEQ15 analysis for lead at the scale Q 2 = 4 GeV 2 .

FIG. 5 .
FIG. 5.The ratio of nuclear parton distribution functions for the full nuclei -iron (A = 56, Z = 26) (top) and lead (A = 208, Z = 82) (bottom) -to the nPDF of full nuclei made up of free protons and neutrons both at the scale Q 2 = 5 GeV 2 .
FIG. 7. Comparison between CMS W ± boson production cross section data with the theory predictions from our fits.The green (red) bands show the theory uncertainties from nCTEQ15WZSIHdeut (DimuNeu) error PDFs.All theory predictions have been shifted by their respective fitted normalization shift.

FIG. 15 .
FIG.15.Distribution of the variable SE for all experiments in the nCTEQ15WZSIHdeut analysis (left) and for all experiments in the BaseDimuNeu analysis (middle).The right panel shows the distribution of the variable SE from the BaseDimuNeu analysis for experiments in nCTEQ15WZSIHdeut.All panels show the fitted Gaussian distribution to the actual SE distribution (blue) compared to the ideal Gaussian SE distribution with µ = 0 and σ = 1 (red).Note some of the SE values lie outside the plot range.

FIG. 18 .
FIG.18.The fitted iron PDF ratio to nCTEQ15WZSIHdeut.All uncertainty bands are obtained using the Hessian method with ∆χ 2 = 45.
FIG.25.The weighted average of the data/theory from fits with NuTeV and Chorus data where the normalization uncertainties are treating using (B4) (top panel) and using the method adopted in this work (B14) (bottom panel).

TABLE I .
Comparison of the χ 2 /pt for the nCTEQ15, nCTEQWZSIH and nCTEQ15WZSIHdeut analyses for selected data sets.Numbers appearing inside brackets show the χ 2 /pt values for data sets that are not used in the corresponding fits.

TABLE II .
New neutrino data sets used in this analysis.

TABLE III .
Relative experimental uncertainties (in percent) of various data sets at Eν ∼ 85 GeV where all the data sets overlap.

TABLE V .
Statistical information such as the total χ 2 and number of data points for all analyses discussed here are presented.Moreover, the χ 2 -percentiles with respect to the reference fit nCTEQ15WZSIHdeut (denoted S) and to the only neutrino DimuNeu analysis (denoted S) are also given.E -compatibility The last criterion used in our analysis is yet another alternative to investigate compatibility of data sets in a combined global analysis.Here we will consider only the global analysis of the combined data sets Z ≡ S ∪ S and investigate the quality of description of each experiment E in this analysis.The comparison of the quality between two different experiments is made difficult by the fact that the χ 2 -distribution P (χ 2 , N ) S

TABLE VI .
Statistical information on the description of the neutrino data sets used in different analyses.

TABLE VII .
Statistical information on the description of the selected neutral current DIS data sets used in the reference nCTEQ15WZSIHdeut and BaseDimuNeu analyses.As expected the worse description can be traced back to the neutral current DIS experimental data which are very sensitive to the up-and down-quark PDF which is one of the PDFs mostly shifted in the combined analysis.The reason why the previous two compatibility criteria signal a problem is hidden in the description of neutrino data.The large standard deviation is mostly caused by the NuTeV neutrino and anti-neutrino crosssection data having extremely large |S E |-values, S E = 13.05 for neutrino (not shown on plot) and S E = 5.5 BaseDimuNeu (see Fig.15and Tab.VI), we can identify the origin of the inconsistencies signaled by the first two compatibility criteria.For the χ 2 /pt and S E data for all neutrino experiments shown in Tab.VI, we can see that the description of the

TABLE VIII .
Statistical information such as the total χ 2 and the number of data points for all analyses discussed here are presented.Moreover, the χ 2 -percentiles with respect to the default data sets of the reference fit nCTEQ15WZSIHdeut (denoted S) and to the DimuChorus analysis (denoted S) are also given if applicable.Distribution of the variable SE for all experiments in the BaseDimuNeuX analysis (left) and for all experiments in the BaseDimuNeuU analysis (middle).The right panel shows the distribution of the variable SE from the BaseDimuChorus analysis.

8
GeV 2 22e predictions for the nuclear correction factors for the neutral and charged current DIS are shown in Figs.22

TABLE IX .
Values of all parameters of the up-quark valence distribution uv in all fits quoted here.Values in bold belong to free parameters which were allowed to vary in the corresponding analysis.