Neutrino-Nucleon Cross-Section Model Tuning in GENIE v3

We summarise the results of a study performed within the GENIE global analysis framework, revisiting the GENIE bare-nucleon cross-section tuning and, in particular, the tuning of a) the inclusive cross-section, b) the cross-section of low-multiplicity inelastic channels (single-pion and double-pion production), and c) the relative contributions of resonance and non-resonance processes to these final states. The same analysis was performed with several different comprehensive cross-section model sets available in GENIE Generator v3. In this work we performed a careful investigation of the observed tensions between exclusive and inclusive data, and installed analysis improvements to handle systematics in historic data. All tuned model configurations discussed in this paper are available through public releases of the GENIE Generator. With this paper we aim to support the consumers of these physics tunes by providing comprehensive summaries of our alternate model constructions, of the relevant datasets and their systematics, and of our tuning procedure and results.


I. INTRODUCTION
GENIE is an international collaboration of scientists working on a global analysis of neutrino scattering data and on the incorporation of modern theoretical inputs and experimental data into robust and predictive semi-empirical comprehensive neutrino interaction simulations. GENIE develops and maintains a suite of well-known software products for the experimental neutrino community, which includes its popular Generator product [1]. With the recent release of the GENIE Generator v3, a substantial change in the way that the GENIE Collaboration approaches the process of developing, validating, characterising, tuning and releasing comprehensive neutrino interaction simulations came into sharp focus. The focus of the GENIE Collaboration has always been the development of universal comprehensive models, handling all probes and targets and simulating all processes across the entire kinematic phase space relevant for neutrino experiments. Previously, the GENIE Collaboration released a sin- * Now at University of Cambridge † Now at Virginia Polytechnic Institute and State University gle, preferred (default) comprehensive model that reflected our current understanding on the most predictive, robust, and self-consistent model that could be built out of GENIE neutrino interaction modelling elements. Whereas many other alternative modelling elements were made available to users, they had to be enabled by individual users through an error-prone procedure that could bring substantial physics and logical inconsistencies, invalidate procedures for addressing double counting issues, and damage the level of agreement with data, often in ways that were unsuspected by users that had a narrow focus on some particular modelling aspect and lacked the GENIE tools and procedures to fully characterise a comprehensive model. To address this, and in response to the community demand for alternative models, GENIE has released a number of comprehensive model configurations (CMCs) and is in the process of constructing several more. All such configurations, that are easily invoked and run out of the box, combine modelling elements in a way that is as consistent as possible, and are validated, characterised and tuned as a whole. This important development was underpinned by a substantial upgrade of GENIE capabilities for systematic model validation, model characterization through comparisons to large collections of complementary scatter-ing data with neutrino, charged lepton, and hadron probes, and the development of an advanced global analysis of scattering data. The GENIE global analysis was made possible through the continued development of curated data archives, and the successful large-scale refactoring and interfacing to the Professor tool [2] of a very extensive set of GENIE codes that implement comparisons to data, within a framework that allows the efficient manipulation of large ensembles of simulated events produced from a constellation of alternative models. The interface to the Professor tool enabled the efficient implementation of complex multi-parameter brute-force scans and removed substantial global analysis limitations by decoupling it from event reweighting procedures that, for all but the most trivial aspects of our physics domain, require substantial development time and are not exact, or even possible at all. Professor 'reduces the exponentially expensive process of brute-force tuning to a scaling closer to a power law in the number of parameters, while allowing for massive parallelisation' [3]. The Professor package has been extensively used for the tuning of Monte Carlo generators in the collider community.
The above developments allowed the GENIE Collaboration to fulfil its dual purpose described in its mission statement: GENIE develops a popular Monte Carlo event generation platform and implements, within its platform, universal and comprehensive physics simulations for lepton scattering, as well as simulations for several Beyond the Standard Model processes. But, in addition, and separately from the previous mission, GENIE develops a global analysis of scattering data for the tuning and uncertainty characterization of comprehensive neutrino interaction models. The GENIE Generator is the main outlet for the GENIE global analysis results, and our goal is that, for each supported comprehensive model, several selected tuned versions shall be made available.
Typically, nuclear modifications to the crosssection are computed separately, and the decomposition of the total cross-section into the possible exclusive final states proceeds via separate hadronization, intranuclear rescattering and particle decay codes. Therefore, bare-nucleon cross-section are a crucial first modelling component to tune in the process of building a global fit of all relevant scattering data. Tunes for several aspects of GENIE modelling, including neutrino-induced hadronization and nuclear cross-sections for low-multiplicity channels, are near completion and will be released and published in the future. This paper summarises the results of the first analysis performed within the GE-NIE global analysis framework, revisiting the GE-NIE bare-nucleon cross-section tune and, in particular, the tuning of the empirical non-resonance background contribution to one and two pion final states. A similar, albeit much simpler, analysis underpinned the tune of the well known and widely used comprehensive model that was included as the default model throughout the very long GENIE v2 series of releases. At that time, not sufficiently explored and understood tensions between inclusive and exclusive data, and an executive decision to anchor the GENIE v2 model on inclusive data, led to some expected and well known discrepancies with exclusive data that were increasingly brought into focus as new experiments started performing increasingly precise measurements of low-multiplicity, exclusive final states [4]. Here, we perform a careful investigation of the observed tensions between exclusive and inclusive data, retune the bare-nucleon crosssection model for all GENIE comprehensive models available in GENIE v3, and provide best-fit values and correlations for several parameters influencing the GENIE bare-nucleon cross-sections. The work presented here was based on the model implementations of GENIE v3.0.6 (released on 23 July 2019), and the results of this work will be included in the GENIE v3.2.0 release. Preliminary versions of this work appeared in earlier releases of the GENIE v3 series (v3.0.0 -v3.0.6).
In Sec. II, we summarise relevant aspects of the free nucleon cross-section modelling in GENIE, while in Sec. III we provide further details for the construction of comprehensive GENIE models considered in this work. In Sec. IV B we provide details of the datasets, parameterisation of model and data uncertainties for this particular tune. Sec. V describes the tuning procedure as well as the statistical methodology used. Finally, our tuning results are presented in Sec. VI.
In the version of GENIE used in this work, there is the option to select one of several neutrino-induced resonance production calculations performed by Rein and Sehgal [5], Kuzmin, Lyubushkin and Naumov [6,7], and Berger and Sehgal [8]. The last two models are extensions of the first one, that account for nonzero lepton masses. Both models are based on the same formalism and the only difference between them is that the latter includes the pion-pole contribution to the hadronic axial current. The term d 2 σ DIS /dQ 2 dW represents the GENIE calculation of the deep inelastic cross-section that, in all relevant GENIE comprehensive model configurations, is carried out using an effective leading order model with the modifications suggested by Bodek and Yang [9] to describe scattering at low momentum-transfers. This model is the foundation of both the DIS model and the SIS model in GENIE.
The term d 2 σ SIS /dQ 2 dW requires some elaboration. It represents the cross-section contribution from non-resonance shallow inelastic scattering in the resonance region. In GENIE, this cross section is computed with an empirical model where the Bodeck and Yang inclusive deep inelastic cross section is extrapolated into the resonance region and it is decomposed, via the GENIE AGKY [10] hadronization model, into the cross sections for different hadronic multiplicity channels. The extrapolation of the DIS model down to the inelastic threshold, W < W cut , includes, on average, the effect of the resonances [11]. Notice that, even though the Bodeck and Yang model is capable of describing the inclusive cross section at inelastic threshold, we prefer to utilize an explicit resonance model. The contribution for hadronic multiplicities 2 and 3, that are responsible for producing many final states similar to those produced via resonance excitation, are tuned to remove double counting. This tuning is the main topic of this work.
The non-resonance shallow inelastic scattering   cross section can be written as whereσ DIS represents the extrapolated deep inelastic cross section into the resonance region, and m refers to the multiplicity of the hadronic system. The factor f m relates the total calculated DIS cross section to the DIS contribution to this particular multiplicity channel. These factors are computed as where R m is an adjustable parameter and P had m is the probability, taken from the GENIE hadronization model, that the DIS final state hadronic system multiplicity would be equal to m, The average hadronic multiplicity m is computed, and the function ψ has the following asymptotic form In the above expressions, α, β, β and c are adjustable parameters. In principle, α, β, β c and R m , are different for each initial state (ν + p, ν + n, ν + p,ν + n) and are different for charged current and neutral current interactions. A new tune of the neutrino-induced hadronization models in GENIE is currently in progress and, in future, it may be possible to perform a joint tuning of the GENIE cross section and hadronization modeling components for bare-nucleon targets. However, at this present work, the parameters α, β, β and c were kept at the default values of the AGKY model in GENIE v3. For easy reference, the relevant values for the channels studied in this work are included in Tab. I. No dependence on Q 2 has been observed in ν andν scattering data [12], hence β = 0 for all channels. For most inelastic processes simulated in neutrinonucleus scattering by all current GENIE comprehensive model configurations, the total inelastic differential cross section for scattering off bare nucleons takes centre stage. In Fig. 1, the contribution to the ν µ CC andν µ CC inclusive cross sections on isoscalar targets in GENIE is shown for the different interaction processes. The CC RES and SIS/DIS CC cross-section contribution for different neutrino energies is shown in Fig. 2.

III. COMPREHENSIVE MODEL CONFIGURATIONS IN GENIE V3
GENIE has a large degree of configuration: for each process (RES, DIS, etc.) the system offers a number of alternative models to be used for event generation. In previous GENIE releases, only one model-process mapping was suggested by the outof-the-box configuration, despite the availability of alternative models. Yet, there was no guidance on how to correctly use different configurations according to author and developers. In fact, processes are not universal and their definitions are generator dependent. Hence, it was easy to come up with inconsistencies between the model configuration for different processes that were supposed to be used together to get a correct comprehensive physics simulation.
This issue was addressed in GENIE v3 by introducing the concept of comprehensive model configuration (CMC) that is a consistent process-model association. Considering that GENIE already has about 20 different processes only for neutrinos, CMC definitions are quite complex objects and they need to be effectively named so that the community can use them unambiguously. For this purpose, the collaboration developed a specific naming convention, see Sec. III A. Sec. III B describes the models used in CMCs relevant for neutrino interactions. • dd is a number describing the year during which the model configuration was first developed.
• v is a character (a, b, c, ...) enumerating different members of the given family of model configurations.
Once a comprehensive model configuration is defined, a number of different tunes may be produced. These may be produced, for example, by a) incorporating different combinations of experimental data, b) considering variations in different combinations of our modelling elements (e.g. bare-nucleon cross sections, nuclear model and nuclear cross sections, neutrino-induced hadronization etc), c) considering • PP is a number identifying the set of tuned parameters. This parameter set is defined uniquely only in the context of a particular model configuration.
• xxx is a number that identifies the dataset used for the model configuration tuning. This may include a unique set of weights associated with each component dataset.

B. CMCs available in GENIE v3
Several CMCs are available in GENIE v3, but they can be grouped together as their scopes are common. The first group of CMCs is historically motivated: it is based on the default configuration and simply provides updates for processes that were introduced later. The second family is an improvement of the first group, in terms of the resonance model. The third one was constructed aiming to deliver the most up to date theoretical nuclear matter simulations. Out of these main ideas, a number of CMCs can be constructed simply changing minor aspects like FSI or form factors. Here, we briefly summarise the modeling components used in each comprehensive model configuration available in GE-NIE v3.
1. G18 01a, G18 01b, G18 01c and G18 01d These comprehensive models share an identical cross-section model construction, which is an adiabatic update of the historical default cross-section model of GENIE v2, now named as G00 00a CMC. For interactions on nucleons and nuclei, it relies on implementations of the following models: the Ahrens model [13] for NC elastic, the Llewellyn Smith model [14] for CC quasi-elastic, the Rein-Sehgal model [5] for NC and CC resonance production, the Rein-Sehgal model [15] for NC and CC coherent pion production, the Bodek-Yang model [9] for NC and CC deep inelastic scattering and nonresonance shallow inelastic scattering, the Kovalenko model [16] for quasi-elastic charm production, and the Aivazis-Olness-Tung slow rescaling model [17] for deep inelastic charm production. Nuclear cross sections are calculated within the framework of a rel-ativistic Fermi gas model, following the approach of Bodek-Ritchie [18]. Multi-nucleon processes in neutrino scattering off nuclear targets can be optionally enabled and simulated via an empirical GENIE model [1]. In addition, in GENIE v3, the adiabatic upgrade of the historical comprehensive model includes the simulation of processes that, previously, were either optional or missing. This includes both diffractive pion production based on an implementation of the Rein model [19], and quasi-elastic |∆S| = 1 hyperon (Λ 0 , Σ − , Σ 0 ) production based on the Pais model [20]. Single Kaon production, although optionally available for neutrinos in GENIE v3 [21], is not yet available for antineutrinos and inclusion in any published GENIE comprehensive configurations was postponed till an antineutrino implementation is available and the Kaon content of hadronic showers produced by GENIE has been retuned following the addition of the single-Kaon generator. Both G18 01a and G18 01b comprehensive models employ a revised resonance decay algorithm and an implementation of the AGKY [10] hadronization model that is unchanged with respect to that used at the latest releases of GENIE v2 series. Four comprehensive model variations are constructed by attaching different intranuclear hadron transport models to the same underlying cross section and hadronization models [22]. G18 01a uses an updated IN-TRANUKE hA effective intranuclear rescattering model which is unique to GENIE, G18 01b uses the new INTRANUKE hN model implementing a full intranuclear cascade including medium corrections, G18 01c uses an interface to the GEANT4 [23] Bertini intranuclear cascade [24] (version 4.10.2) and G18 01d uses an interface to the INCL++ (version 5.2.9.5) implementation of the Liège intranuclear model [25].
2. G18 02a, G18 02b, G18 02c and G18 02d This is family of empirical models which is an evolved version of the G18 01[a-d] ones. The general construction of the cross-section model is similar to the one discussed above, with the exception that the implementations of the Rein-Sehgal models for CC and NC resonance neutrino-production, as well as for CC and NC coherent production of mesons, were replaced with updated models by Berger-Sehgal [8]. Similarly to G18 01[a-d], four comprehensive model variations are constructed by using alternative intranuclear hadron transport models on top of the same underlying cross section and hadronization models (a: INTRANUKE/hA, b: INTRANUKE/hN, c: GEANT4/Bertini, and d: This is a family of models derived from the improved empirical ones (G18 02[a-d]) described above, by substituting both the Llewellyn Smith CC quasi-elastic model [14] and GENIE's empirical multinucleon model with implementations of the corresponding Valencia models by Nieves et al. [26]. This family of comprehensive models provides a firmer theoretical basis for the simulation of neutrino-nucleus scattering around the quasi-elastic peak. Within this family of models, the nuclear environment is modelled using a Local Fermi Gas, matching the inputs used for the published Valencia calculations. Again, four comprehensive model variations (a-d) are constructed by using alternative intranuclear hadron transport models, following the same naming convention introduced above. The implementation of the Valencia model in GENIE does not predict the kinematics of the outgoing hadrons and its description needs to be accompanied by one of the FSI models available in GENIE (a-d) [27]. 4. G18 10i, G18 10j, G18 10k and G18 10l These comprehensive models are derived from the G18 10[a-d] ones, by replacing the dipole axial form factor, used in the calculations of quasielastic cross sections, with the better-motivated zexpansion model [28], providing a richer set of degrees of freedom for parameterizing quasi-elastic model uncertainties.
As in all previous families of models, four comprehensive model variations (i-l) are constructed by using alternative intranuclear hadron transport models, though the labelling is now different (i: INTRANUKE/hA, j: INTRANUKE/hN, k: GEANT4/Bertini, and l: INCL++).

C. Free nucleons and CMCs
Although a large number (16)  Several variations of the tuning procedure were run and evaluated for testing purposes, before converging to the procedure presented in this paper. Preliminary versions of this work were released in the GENIE v3 series (v3.0.0-v3.0.6) in a series of tunes carrying the 02 11a label. The final results presented in the paper will be made available in GE-NIE v3.2 in a series of 16 tunes, one for each of the 16 comprehensive model configurations summarised above, labelled as 02 11b. For example, the tune G18 10a 02 11b corresponds to the G18 10a comprehensive model with the parameters determined through the tuning procedure discussed in this paper (02 11b). The GENIE tune naming convention was discussed in an earlier section. A full list of GENIE tunes is maintained in http://tunes. genie-mc.org. The preliminary versions (02 11a) of the tunes will be kept in GENIE v3.2, but they will be phased out in subsequent minor releases.

IV. DATA AND MODEL UNCERTAINTIES REVIEW
The data used in this analysis are old and a careful review of the past analysis procedure is required in order to combine all the data together in a global analysis. This section summarises the data details and how the models used in the fit behave in the same energy region.

A. Datasets included in the fit and their systematics
In the current work, we consider Hydrogen and Deuterium data from the ANL 12FT, BNL 7FT, FNAL 15FT and BEBC bubble chamber experiments. The data represent integrated cross sections for different incoming neutrino energy bins for • ν µ andν µ CC inclusive scattering  • ν µ andν µ CC quasi-elastic scattering [29,39,53,[57][58][59][60][61][62][63][64][65] • ν µ andν µ CC single-pion production [58,[66][67][68][69][70][71][72][73][74]] Not all of the available historical data has been used for the fit, as some datasets were superseded or reanalysed, as in the case of ANL 12FT and BNL 7FT, datasets. The latest analysis are used. A detailed summary of the datasets used in the fit is shown in Tab. II and in Fig. 3. Some of the datasets included in the tune consider Hydrogen-Neon mixtures. The nuclear effects of the neon in the target mixture are shown to be negligible [76].
Low energy bins have a higher contribution to the χ 2 due to energy smearing and lack of unfolding in measurements. Hence, data points with E ν < 0.5 GeV are removed from the fit. In total, the tune is performed with 169 data points from bubble chamber experiments. Different analysis methods were implemented in each experiment, such as cuts applied on the W invariant mass, the outgoing muon momentum or the total longitudinal momentum of the final state. The associated GENIE prediction has been corrected by applying the same cuts to the generated events. Moreover, datasets from the same experiments are not independent as they share the same neutrino flux, detector, analysis methodology, etc. Although it is clear that some correlated uncertainties exist, the data releases do not contain any information about the correlation between them. In the GENIE database, we added a systematic error to the datasets of 15%. The methodology used to include them in the fit is detailed in Sec. V C. Other free nucleon data on heavier targets are available but used only for comparison with the GENIE prediction. No correction for nuclear effects is considered for deuterium targets.

B. Model uncertainties
The SIS cross section is tuned within the CMCs using either the Rein-Sehgal or Berger-Sehgal resonance models, see Sec. III B. The tuning main goal is the best value estimation for nine of the parameters that drive the GENIE predictions in the SIS region. These parameters are the W cut as defined in Eq. 1, the four R m coefficients for CC interactions on neutron/proton with m = 2, 3 from of the SIS region (Eq. 3), the axial masses used in the dipole form factors for RES and QE interactions, and 2 global scaling factors for the RES cross section and the DIS cross section. For clarity, we will refer to R m parameters with the number of pions in the final state, namely R CC1π   (a) νµ CC cross section. Breakdown of quasi-elastic, one and two pion production and deep inelastic processes is shown. The predictions are computed using the G18 02a 00 000 configuration. The data on hydrogen and deuterium targets from Tab. II is shown if available from ANL 12FT ( ), BNL 7FT (•), BEBC ( ) and FNAL ( ).
summarises the parameter pre-fit values and the allowed ranges. Previous fits to data are taken into account for the determination of the ranges [78,79].
Each of the parameters have a different sensitivity to each dataset, as different scattering mechanisms are involved. The response of each parameter in the inclusive and exclusive cross sections is studied by varying each of them independently within the studied range. In Fig. 4, each parameter response is shown for inclusive and exclusive cross sections. When more than one parameter in the plot is impacting the same cross section, i.e. CC inclusive, the variations are added in quadrature.
At the Monte Carlo level, where no correlation between the parameters is considered, the impact of each of the parameters in the cross section can be classified as influencing a variation on will only affect the quasi-elastic cross section prediction, as summarized in Fig. 4a. Notice though that, at the tune level, this will no longer hold as the introduction of flux nuisance parameters correlates exclusive channels. Hence, this will introduce a correlation between M QE A and the SIS parameters.
The description of the CC RES cross section will be affected by the RES axial mass M RES A (Fig. 4a), the resonant scaling parameter S RES (Fig. 4b), and W cut (Fig. 4d). The default G18 01a and G18 02a configurations overestimate one-pion production processes, and would favor a reduction in the CC RES cross section. Variations of M RES A have a huge impact on both exclusive and inclusive CC cross sections in the few-GeV region. However, as it is explained in Sec. V C, this parameter should agree with the world average extracted from fits to the axial form factor [78] and a deviation from this result is disfavoured by previous fits to data. Consequently, a reduction of S RES is expected to improve the agreement with one-pion production data. On the other hand, W cut will play an important role as it determines the number of resonances included in the CC RES calculation. The current default, W cut = 1.7 GeV/c 2 , discards the resonances contributing at W > W cut . Therefore, an increase on W cut will incorporate new resonances in the calculation that were not taken into account in previous tunes. This increase is favoured by two-pion production data, as heavier resonances producing more than one pion are incorporated.
The SIS region is treated by combining two crosssection models, one for DIS and one for RES interactions. Thus, in that region, many parameters have a visible effect on the predictions as can be seen in Fig. 4: regardless of the parameter considered in the plot, there is always a visible error band in the few-GeV region. This is a clear hint for the presence of degeneracy that must be faced by our global tunes. An example of this is given by the R n and the S DIS parameters, which act as scaling factors for the DIS contribution at W < W cut. As mentioned above, a desired result of the tune is to reduce the one-pion prediction and increase the two pion production. This can be accomplished via alterations     of either the R n and/or the S DIS parameters.

V. BARE-NUCLEON CROSS-SECTION TUNING PROCEDURE
This section describes the core ideas behind the paper. Most of these are not specific for this work: they are general concepts developed within the GE-NIE tuning system and can therefore apply to future tune releases.

A. Likelihood construction
The GENIE integrated cross-section prediction is denoted with σ i th (E k |θ), where E k is the neutrino energy, θ is a vector [81] of the adjustable physics parameters introduced in Sec. IV B, and i is any of the 10 reaction processes considered in the work presented in Tab. II. Using σ i th (E|θ), we produce the corresponding prediction for the k-th energy bin of the j-th dataset for the i-th reaction type, where ε ij (E k , θ) are dataset-dependent efficiencies expressing the fraction of events from the i-th process that survive the kinematical cuts imposed by the experiment, see Tab. II. The statistical error due to the MC sample size is also evaluated and this is denoted δσ ij (E k |θ).
Performing a multi-parameter brute-force scan and tune using σ ij th (E k |θ) is computationally inefficient. As was highlighted in the introduction, the GENIE global analysis framework relies on Professor [2] to reduce the computational complexity of brute-force scans while allowing for massive parallelisation. Using the values of σ ij th (E k |θ) computed for a number (N R ) of randomised P-dimensional vectors θ, produced within the P-dimensional hypercube defined by the parameter ranges given in Tab. III, we use Professor to generate a parameterisation of σ ij th (E k |θ) and δσ ij (E k |θ) that will be denoted with σ ij th (E k |θ) and δ σ ij (E k |θ) respectively. As discussed in [2], the parameterisation is a generic polynomial of order M in the P-dimensional space, whose analytical form is where θ n is the coordinate of the n-th parameter. The polynomial order M is set by the user. The coefficients α ijk 0 , β ijk n , γ ijk (nm) , . . . , ξ ijk (n1...n M ) are determined by Professor fitting the parameterisation against the computed σ ij th (E k |θ). In the analysis presented here, a 4 th order polynomial was used for the G18 01a comprehensive model configuration while a 5 th order polynomial was used for G18 02a. Particularly, N R = 1500 for G18 01a and N R = 2183 for G18 02a. The accuracy of the parameterisation is demonstrated in the residual distributions shown in Fig. 5. The parameterisation σ ij th (E k |θ) is used instead of the exact predictions in order to to estimate the best-fit parameters by minimising the χ 2 .

B. Treatment of systematic uncertainties
A number of nuisance parameters, each with a corresponding prior, can also be used to tackle the problem of correlated datasets. As seen in Sec. IV A, there are different datasets coming from the same experiments (ANL 12FT, BNL 7FT, BEBC, and FNAL 15FT). Each of these experiments share the same flux (from either a neutrino or an antineutrino beam), analysis procedure, etc. Therefore, there is a correlation between the datasets, even though it has not been quantified in the data releases. A possible approach is to add nuisance parameters that can connect datasets from experiments that used the same neutrino beam [82]. As the main systematic uncertainty comes from the fluxes, the nuisance parameters will act as scaling factors for our predictions ( σ ij th (E k |θ)) and are same for datasets sharing the same flux.
Some of the ANL 12FT and BNL 7FT data were already corrected for the flux normalization [68]. Due to this correction, the associated systematic error is smaller and, accordingly, a more restricted nuisance parameter is applied to the re-analysed datasets. These restricted parameters take into account other common systematics like reconstruction procedures, so they multiply all the predictions related to the same experiment. At the end of this procedure, each prediction can be scaled by up to 2 nuisance parameters, one for the flux and one for the remaining systematics. Thus, a total of 9 independent nuisance parameters are used to account for the correlation between datasets. They are all the available combinations of experiment and neutrino flux exposure (ν µ andν µ ) plus the restricted parameters for re-analysed data: Quasi-elastic data for hydrogen and deuterium targets is included in the tune in order to constrain the nuisance parameters. Even though quasi-elastic data is not directly constraining the SIS parameters, it plays an important role to further constrain the fluxes of each experiment, as it is known at the 15 % level.
The main advantage of this method is the unbiased choice of the nuisance parameters, as their values will be determined by the minimization of the likelihood function. For the calculation of best-fit points and the calculation of intervals, these nuisance parameters are profiled (on every instance of our fit, they are eliminated by substituting them with the value that minimises χ 2 ).

C. Discussion of priors
The likelihood is corrected using priors on parameters of interest (θ) and nuisance parameters (f ). Priors allow us to incorporate in this analysis the appropriate pre-fit uncertainties and correlations for the parameters of interest. Only Gaussian priors are considered at present.
The priors applied to each nuisance parameter f j have a peak at 1 and different standard devi-TABLE (IV) Nuisance parameters, f j , per experiment (ANL 12FT, BNL 7FT, BEBC or FNAL 15FT) and neutrino beam (ν µ orν µ ). Priors consider the systematic uncertainty applied to each dataset as δf j , where j is one of the datasets under study. The allowed range is [0, 2] for nuisance parameters considered in the tune.
Parameter Prior ations δf j . In general the total scaling factor applied to non-re-analysed datasets are constrained by a conservative 15 % δf Gaussian prior, except for those nuisance parameters that act on the same experiment. Thus, the BEBC and FNAL 15FT experiments have only one associated scaling factor δf BEBC = δf FNAL = 0.15 for both neutrino and anti-neutrino fluxes; the same is true for f BNL (ν µ ). Up to two nuisance parameters can be applied to ANL 12FT and BNL 7FT data (i.e. f AN L (ν µ ) and f AN L Re (ν µ )). The ANL 12FT and BNL 7FT restricted nuisance parameters, f ANL Re (ν) and f BNL Re (ν), have δf = 5%. δf ANL and δf BNL are such that ANL 12FT and BNL 7FT non-re-analysed datasets data are constrained by an overall 15% Gaussian. The full summary of the nuisance parameters is in Tab. IV.
Priors are applied to the parameters of interest to penalize disagreement with well-established parameter values. For instance, the description of neutrino CC quasi-elastic cross sections and single-pion production through baryon resonances is strongly determined by the shape of the weak axial and vector transition form factors. For the G18 01a(/b/c/d) and G18 02a(/b/c/d) CMCs, the axial form factors are described using the dipole parameterisation which is a function of the invariant transferred momentum (Q 2 ): with F A (0) = g A = −1.2695 ± 0.002 [83]. The extraction of these parameters requires neutrino differential cross sections as a function of Q 2 that are not used in this analysis. Our goal is not the extraction of the axial masses but the better estimation of the cross section at the SIS region. For this reason, these values are used as priors in our global fits. Another parameter of interest which is strongly constrained by data is the S DIS parameter. This parameter dominates the cross-section behaviour at high neutrino energies. Most of the data in that energy range comes from neutrino interactions with heavy nuclear targets and are therefore not included in the fit. A Gaussian prior is considered to ensure that agreement with these data are preserved [84] by our tuning procedure. This would not be the case otherwise as the SIS region data would prefer much higher cross-section values for the DIS contribution. The prior on S DIS provides a good solution for this problem because the degeneracy between DIS and non-resonant background parameters gives us multiple ways to accommodate good agreement between data and GENIE predictions in the SIS region. In other words, the introduction of the S DIS prior breaks the degeneracy without adding more datasets to the fit.

D. Final form of the χ 2
Including all of the contributions from the previous sections and defining σ ijk d (δσ ijk stat ) as the data central value (statistical error) corresponding to the σ ij th (E k |θ) prediction, the complete form of our χ 2 distribution becomes: where φ j (f ) is the product of the nuisance scaling factors that are relevant for j-th dataset as described in Sec. V B. θ 0 and Σ θ are the central values and the covariance matrix of the priors for the parameters of interest, respectively. Equation 10 represents the full capability of our tuning machinery. However, the priors we applied for the present work were uncorrelated and so only the diagonal entries of Σ θ were used. The details on the priors applied in this analysis are described in Sec. V C. The contribution of each point to the likelihood can be (de-)emphasized using weights w ijk to set the relative importance of different datasets (or of individual data points within a dataset). Such weighting schemes have been used extensively in generalpurpose event generator tunes for the LHC (for an example, see [85]). In this particular analysis, the weights are used to include or exclude datasets only (w ijk ∈ {0, 1}).

VI. TUNING RESULTS
In order to properly understand the global tune, the tensions between datasets must be discussed. These tensions are studied by performing fits using a specific dataset to evaluate the impact of the partially-fitted predictions on the rest of the datasets included in the global tune.

A. Partial fits
Two main subsets were identified in the global dataset in order to study tensions: inclusive and exclusive datasets. The fits consider the G18 02a CMC as the base configuration and include nuisance parameters to take into account the correlation between datasets from the same experiment, see Sec. V for more details. No priors on M RES A and M QE A are applied as we are interested to see the impact of each subset on the prediction. The fit to inclusive data only is not sensitive to the scaling multiplicity parameters for the non-resonant background, therefore those parameters are fixed to their default values during the fit.
Partial fit results for inclusive and exclusive data are presented in Tab. V. The tune against inclusive data only achieves much better agreement with in-clusive data than the previous GENIE G18 02a default, see Fig. 6. This difference between the old and new inclusive tune is due to 1) the inclusion of only hydrogen and deuterium datasets and 2) the effect of the nuisance parameters. [86] Particularly, without exclusive data, a small reduction of the resonant cross section is already observed in the CC RES region. The result for M RES A is consistent with previous results without the addition of priors [78]. W cut is pulled to the lower edge of the parameter range: the parameter uncertainty could not be estimated as the χ 2 minimum was found on the contour.
As expected, the fit to exclusive data only is able to correctly describe exclusive datasets for one and two pion production. The low cross-section data for one pion production forces all the relevant parameters to decrease with respect to the default values see Fig. 7a. At the same time, two pion production data forces R CC2π νp , R CC2π νn and W cut to increase in order to match two pion production data, see Fig. 7b. The agreement with ν µ CC inclusive data is worse, see Fig. 6, but the compatibility is still acceptable given the large uncertainties on the data in that region. On the other hand, the partial fit does not obtain a good prediction for M RES A . W cut is fixed to its maximum value of 2 GeV to avoid nonphysical regions.
The exclusive fit clearly shows a preference for a larger total cross section in the neutrino energy region between 1 and 10 GeV due to the high value of R CC2π νp and R CC2π νn . This is a tension between exclusive and inclusive datasets as the inclusive prefer a lower value in that E ν region. Since inclusive data constitute about 40% of all the data points, the inclusion of priors for the axial masses and S DIS becomes crucial to overcome the tension [87].

B. Global fit
The analysis procedure outlined in previous sections was applied to the comprehensive model configurations listed in Sec. III. The best-fit parameter values obtained from the GENIE analysis for each alternative CMC are shown in Tab. VI and Tab. VII. The GENIE v3 cross-section curves that correspond to the two sets of tuned parameters are shown in Figs. 8, 9, 10, 12 and 11. For reference, we also show the cross-section predictions made by the default G18 02a CMC that is available in the last public release of the GENIE v3 series (3.2).
For all CMCs the tune has the most impact on the SIS region. In the inclusive cross-section prediction, this translates into a decrease of both ν µ andν µ CC inclusive cross section in the 0.5-10 GeV region, see Fig. 8. At the same time, the cross section at higher The best-fit values obtained for the G18 02a(/b) CMC can be used for the G18 10a(/b) as the same bare-nucleon underlying models are used. Moreover, for the G18 10i(/j) CMCs, the best-fit values from the G18 02a(/b) tune can also be used, except for M QE A , as the quasi-elastic axial form factor is parametrised with the z-expansion model instead of a dipole and the corresponding z-expansion parameters are kept to the default values.  neutrino energies has barely changed, respecting the constraints of high-energy data. The agreement with quasi-elastic data, included in the tune in order to constrain the fluxes of each experiment, remained the same, see Fig. 9.
As discussed in Sec. VI, this decrease of the inclusive cross section at the SIS region is driven mainly by one pion production data. The impact on one  pion exclusive channels is shown for (anti)neutrino on proton, Fig. 10, and neutrino on neutron, Fig. 11. The reduction of the one pion production cross section for neutrino on proton and neutron shows an improvement on ν µ CC1π + , ν µ and ν µ CC1π − and ν µ CC1π 0 channels when comparing it with the available data.
Two pion production exclusive cross sections are summarized in Fig. 12. This is the first time that two pion production data are used to tune the SIS region, allowing the R CC2π νp and R CC2π νn parameters to be constrained. In this case, the two pion exclusive cross section was underestimated by the default tune. For this particular exclusive process, comparisons are made against ν µ CCπ + π + , ν µ CCπ + π 0 and ν µ CCπ + π − data. The shape of the GENIE prediction for the ν µ CCπ + π + and ν µ CCπ + π 0 channels differs strongly from data, and the models are not able to accommodate this behaviour. However, the agreement with ν µ CCπ + π − data has improved by increasing the cross section with respect to the default cross-section model.
Despite the tensions between inclusive and exclusive data discussed in Sec. IV B, the overall agreement for both cross-section model constructions has improved, see Tab. IX. Particularly,ν µ CC inclusive predictions show better agreement after the tune, and the same is observed for ν µ CC inclusive predictions for the G18 01a free nucleon tune. Although the impact on the cross-section prediction of the    8) Best fit prediction impact on muon (anti)neutrino CC inclusive cross sections as a function of the neutrino energy (E ν ). The associated predictions for the default G18 02a and tuned G18 02a and G18 02a are computed with GENIE v3.0.6. Predictions are compared against all the available data (anti)neutrino interactions on H, 2 H and heavier targets. Both CMC have been tuned against some H, 2 H data (filled markers). Each χ 2 is computed using all data available. In Tab. IX, the χ 2 values per dataset are specified. tune is similar for the existing configurations, the response of each model at the parameter level is not expected to be the same. Therefore, each tune is strongly affected by how the model is able to accommodate the data by modifying the tuned parameters. This reflects on the R m parameters and W cut which best fit values are incompatible in some cases, such as for R CC1π   FIG. (9) Best fit prediction impact on muon (anti)neutrino CC quasi-elastic cross sections as a function of the neutrino energy (E ν ). The associated predictions for the default G18 02a and tuned G18 02a and G18 02a are computed with GENIE v3.0.6. Predictions are compared against all the available data (anti)neutrino interactions on H, 2 H and heavier targets. Bot CMC have been tuned against some H, 2 H data (filled markers). Each χ 2 is computed using all data available. In Tab. IX, the χ 2 values per dataset are specified.
pression, the function ∆χ 2 profile (θ i ) is constructed by fixing θ i to a desired value and minimising the quantity ∆χ 2 (θ, f ) = χ 2 (θ, f ) − χ 2 min with respect to all other parameters that were allowed to float in the fit. See Sec. V A for the definition of χ 2 (θ, f ). The constant χ 2 min corresponds to the minimum value of χ 2 (θ, f ) obtained from the global fit. The ∆χ 2 profile (θ i ) functions we derive from our analysis are shown in Fig. 13, for all parameters θ i that were allowed to float in the fit, up to ∆χ 2 profile values of 2.
Particularly, W cut is fixed to the best fit value during this approach, as it is an ad-hoc parameter introduced by the generator: by fixing it, its uncertainty will be reflected on the other parameters. It is important to emphasize that the uncertainties quoted relate only to ∆χ 2 critical = 1. However, this region is strongly determined by the underlying model used in the tune.
A covariance matrix is also obtained through the inversion of the the Hessian of the log-likelihood    10) Best fit prediction impact on muon neutrino on proton CC one pion production cross sections as a function of the neutrino energy (E ν ). The associated predictions for the default G18 02a and tuned G18 02a and G18 02a are computed with GENIE v3.0.6. Experimental cuts are also applied to the predictions when needed. Predictions are compared against all the available data (anti)neutrino interactions on H, 2 H and heavier targets. Both CMC have been tuned against some H, 2 H data (filled markers). Each χ 2 is computed using all data available. In Tab. IX, the χ 2 values per dataset are specified. function at the best-fit parameter point. The corresponding correlation matrices are presented in Tab. VIIIa and Tab. VIIIb for the tunes of all 4 different cross-section model constructions used in this work (see the correlation matrices in Fig. 14 and Fig. 15 for a graphical interpretation). An example of the propagation of model uncertainties from the Professor output to the GENIE Comparisons framework is shown in Fig. 17.
Joint ∆χ 2 profile (θ i , θ j ) functions, constructed by fixing two parameters at a grid of values and mini-mizing and ∆χ 2 (θ, f ) with respect to all other new parameters, are shown in Fig. 16 for selected sets of parameters. In Figs.16, we can see that the coverage of the parameter space for the 68% and 95% confidence level lines is wider for the G18 01a(/b) tunes. This characteristic is again not related with how well we can constrain the parameters from the data, but with the capability of the models to accommodate this data in each model implementation.  FIG. (11) Best fit prediction impact on muon neutrino on neutron CC one pion production cross sections as a function of the neutrino energy (E ν ). The associated predictions for the default G18 02a and tuned G18 02a and G18 02a are computed with GENIE v3.0.6. Predictions are compared against the original and reanalized ANL 12FT and BNL 7FT data [67,68]. Only reanalized data with E ν > 0.5 GeV is used in the tune (filled markers). Each χ 2 is computed using all data available.

VII. CONCLUSIONS
GENIE has released a number of comprehensive model configurations (CMCs) which consist of different modelling aspects combined altogether. In previous GENIE versions, there was a preferred default comprehensive model which failed to describe both inclusive and exclusive channels due to unresolved tensions between the data. These tensions, which are crucial to understand for the new generation of neutrino experiments, motivated a careful investigation and retune of the bare-nucleon cross-section model for all GENIE comprehensive models available in GENIE v3. Best-fit values and correlations for several parameters influencing the GENIE barenucleon cross sections are released in this paper. (b) Comparison of νµ CC π + π 0 data on proton. (c) Comparison of νµ CC π + π − data on neutron.
FIG. (12) Best fit prediction impact on muon neutrino CC two-pion production cross sections as a function of the neutrino energy (E ν ). The comparisons to two-pion production data are shown against the default and tuned CMCs. The associated predictions for the default G18 02a and tuned G18 02a and G18 02a are computed with GENIE v3.0.6. Predictions are compared against ANL 12FT and BNL 7FT data.
In GENIE v3, we focus on improving understanding of the SIS region by tuning the GENIE CMC predictions on hydrogen and deuterium data from ANL 12FT, BNL 7FT, BEBC and FNAL 15FT bubble chamber experiments. The tuning of the nonresonant background takes a central stage in this work in order to remove double counting issues. The SIS region has been tuned against ν µ andν µ CC inclusive, quasi-elastic, one pion and two pion integrated cross sections as a function of E ν . Quasielastic data has been introduced to the fit to better constrain the flux of each experiment.
The global fit is able to describe both inclusive and exclusive cross sections simultaneously. Tensions between inclusive and exclusive data have been re-encountered, and, as a consequence, the inclusive cross section at the 1-10 GeV energy region decreased with respect to the historical default value. The systematic treatment of correlations between datasets and the inclusion of priors were crucial to address tensions. After the tune, GENIE predictions of one pion production cross sections on free nucleons (ν µ CC pπ + , nπ + , pπ 0 and pπ + π − ) show a decrease in the non-resonant background contribution,      (14) Parameter correlation matrix from the GENIE fit using the G18 01a(/b) CMC correlation matrix. improving the agreement with the data. The prediction for two pion production mechanisms is also in better agreement with data for the ν µ n → µ − pπ + π − channel by increasing the two pion production nonresonant background contribution.

VIII. ACKNOWLEDGEMENTS
We would like to thank Andy Buckley (University of Glasgow, UK) and Holger Schultz (Institute of Particle Physics Phenomenology, University of Durham, UK) for their support interfacing the Professor tool with the software products that underpin the GENIE global analysis. We are also grateful with Luis Alvarez-Ruso for insightful discussions and the review of this paper. We would like to thank the CC-IN2P3 Computing Center, as well as the Particle Physics Department at Rutherford Appleton Laboratory for providing computing resources and for their support. This work, as well as the ongoing development of several other GENIE physics tunes was enabled through a PhD studentship funded by STFC through LIV.DAT, the Liverpool Big Data Science Centre for Doctoral Training (project reference: 2021488). The initial conceptual and prototyping work for the development of the GENIE / Professor interfaces, as well as for the development of the GENIE global analysis framework that, currently, underpins several analyses, was supported in part through an Associateship Award by the Institute of Particle Physics Phenomenology, University of Durham.