Implementation of the SuSAv 2-MEC 1 p 1 h and 2 p 2 h models in GENIE and analysis of nuclear effects in T 2 K measurements

We first present the implementation and validation of the SuSAv2-MEC 1p1h and 2p2h models in the GENIE neutrino-nucleus interaction event generator and a comparison of the subsequent predictions to measurements of lepton and hadron kinematics from the T2K experiment. These predictions are also compared to those of other available models in GENIE. We additionally compare the semi-inclusive predictions of the implemented 1p1h model to those of the microscopic model on which SuSAv2 is based Relativistic Mean Field (RMF) to begin to test the validity of widelyused ‘factorisation’ assumptions employed by generators to predict hadron kinematics from inclusive input models. The results highlight that a more precise treatment of hadron kinematics in generators is essential in order to attain the few-% level uncertainty on neutrino interactions necessary for the next generation of accelerator-based long-baseline neutrino oscillation experiments.

We first present the implementation and validation of the SuSAv2-MEC 1p1h and 2p2h models in the GENIE neutrino-nucleus interaction event generator and a comparison of the subsequent predictions to measurements of lepton and hadron kinematics from the T2K experiment. These predictions are also compared to those of other available models in GENIE. We also compare the semi-inclusive predictions of the implemented 1p1h model to those of the microscopic model on which SuSAv2 is based -Relativistic Mean Field (RMF) -to begin to test the validity of widelyused 'factorisation' assumptions employed by generators to predict hadron kinematics from inclusive input models. The results highlight that a more precise treatment of hadron kinematics in generators is essential in order to attain the few-% level uncertainty on neutrino interactions necessary for the next generation of accelerator-based long-baseline neutrino oscillation experiments.

I. INTRODUCTION
The modelling of neutrino-nucleus interactions in the one-to-few GeV region is one of the most complicated issues facing current long-baseline neutrino oscillation measurements (T2K, NOVA) and is expected to be one of the limiting factors for the sensitivity of the future experiments such as DUNE and T2HK [1]. A key systematic uncertainty arises from the description of multinucleon correlations in the initial state which may induce 2-particle-2-hole (2p2h) final states. It is particularly important to understand the size of the 2p2h interaction cross section compared to the single-body contributions (1p1h) as a poor modelling of this leads to a direct bias on the reconstruction of neutrino energy and therefore must be covered with large systematic uncertainties in current oscillation analyses [2,3]. Various models [4][5][6][7][8][9][10][11][12][13][14] have been developed to described such 1p1h and 2p2h processes, in this paper we focus on the SuSAv2 models [15][16][17][18]. The SuSAv2 1p1h model, originally based on the superscaling phenomenon [19][20][21][22] shown by electronnucleus scattering data, has recently been improved through the inclusion of Relativistic Mean Field theory effects [23][24][25][26]. This model has proven its validity to describe the nuclear dynamics observed in electron-nucleus reactions while taking into account the experimentallyobserved enhancement of the transverse scaling function, compared with its longitudinal counterpart, as a genuine relativistic effect together with a careful treatment of the final-state interactions (FSI) between the outgoing nucleon and the residual nucleus. For the description of the 2p2h-MEC (meson exchange current) contributions the model makes use of the fully relativistic calculations from [27] which allows for a proper separation of neutron-proton and proton-proton pairs in the final state via the analysis of the direct-exchange interference terms [28]. The combined SuSAv2-MEC model, covering the CCQE and 2p2h channels, has been shown to be capable of reproducing the nuclear dynamics and superscaling properties observed in (e, e ) reactions [19,20,29], which serves as a stringent test for nuclear models, whilst also providing an accurate description of existing neutrino data [17,18,[29][30][31]. Up to now, SuSAv2-MEC is the only fully relativistic model that can be extended without approximations to the full-energy range of interest for present and future neutrino experiments. In this paper we present the implementation of SuSAv2-MEC 1p1h and 2p2h contributions in the GENIEv3 Monte Carlo neutrino interaction simulation and use it to better characterise nuclear effects in T2K neutrino scattering cross-section measurements.
Such implementations of the neutrino-nucleus interaction models in event generators is crucial for a variety of reasons. Firstly, a proper modelling of neutrino interactions in the simulation of oscillation experiments is needed in order to perform a correct extrapolation of the near detector constraints to the far detector in the analyses aimed at measuring the neutrino oscillation parameters. This argument is evident for experiments which use (or are planning to use) different detector technologies and nuclear targets at near and far detectors. Even in the case of two detectors exploiting the same technology, such an extrapolation is not straightforward because of the different acceptance of the two detectors, due to different size (and possibly different selections). But beyond such issues, the most complex systematic in the near-to-far extrapolation actually comes from effects which are independent on the detector technology. Due to the neutrino oscillations, the neutrino energy distribution is different in the near and far sites, therefore the cross-section must be evaluated at different energies. Moreover the near detector constrains only the product of neutrino flux and cross-section, which each extrapolates to the far detector differently. The disentangling of the two is based on a simulation (and tuning) of the flux and of the neutrino interactions.
The implementation of the neutrino interaction mod-arXiv:1905.08556v1 [hep-ex] 21 May 2019 els in generators is also essential to perform a proper comparison of such models with some of the most recent cross-section measurements. Indeed, in order to provide the most model-independent unbiased results possible, experiments prefer to measure cross sections of semiinclusive interaction topologies (e.g. charged-current with zero pions in the final state, CC0π) rather than measuring the physical interaction processes (e.g. 1p1h), thereby avoiding correcting the data for effects due to hadronic Final State Interactions (FSI) inside the target nucleus which can cause nuclear emission or absorption. Consequently, FSI effects must be added to the models in order to compare to the data. For example, a measurement of zero-pion final states contain, in addition to the bulk of 1p1h processes, further contributions from 2p2h processes and from resonant interactions with subsequent pion absorption by FSI. The latter contribution is very difficult to describe in a pure microscopic model but is included in neutrino interaction event generators [32][33][34]. Furthermore, recent experimental results focus on multidimensional and/or exclusive measurements (e.g. muon and proton kinematics and their correlations in measurements of CC0π interactions) whilst most of the available models are only able to calculate inclusive cross-sections. It is currently only by their implementation in event generators that they can be used to predict exclusive or semiexclusive final states. While such an approach relies on heavy approximations, which will be discussed in Sec. II, it is still the only option available today for the majority of models to describe fully exclusive final states and, more importantly, is the technique used in neutrino oscillation measurements. Therefore, the implementation of more sophisticated neutrino models, such as SuSAv2, even in such a very approximated approach, is important in order to improve the predictions for the oscillation measurements. The comparison to cross-section measurements is then crucial in order to estimate the systematic uncertainties induced by the usage of such approximated approaches in measurements of neutrino oscillations.
In this manuscript we present, in Sec. II, the implementation of the SuSAv2 models in the GENIE event generator, alongside a discussion of all the approximations involved. The comparison of the SuSAv2-MEC 2p2h implementation with other 2p2h implementations is then shown in Sec. III. In Sec. IV, the RMF, SuSAv2 and SuSAv2-GENIE models are compared for the 1p1h channel at T2K kinematics together with a dedicated analysis of T2K CC0π measurements with a restriction on the outgoing nucleon momentum, allowing a first test of some of the key approximations built into neutrino interaction event generators. The SuSAv2-MEC implementation is also tested against other recent T2K semi-inclusive measurements (CC0π with and without protons) in Sec. V where an analysis of the single-transverse variables is also shown. In the Appendices A-C, we test the SuSAv2-MEC implementation for T2K CC0π inclusive data and compare the SuSAv2-MEC implementation with the Nieves one in both inclusive and semi-inclusive reactions. Fi-nally, we present our conclusions in Sec. VI.
Throughout this manuscript all GENIE predictions are made using GENIE version R-3 00 02 [35] which serves as the base model for the implementations presented here. 1

II. IMPLEMENTATION IN GENIE AND THE FACTORISATION APPROACH
The GENIE event generator simulates 1p1h and 2p2h neutrino-nucleus interactions using Monte-Carlo methods to produce events at a rate which is proportional to their modelled inclusive cross section. For the newly implemented SuSAv2 1p1h and 2p2h interactions this is calculated as a double differential cross-section in momentum and energy transfer to the hadronic system (q 3 , q 0 ) by contracting a generic leptonic tensor with the hadron tensors taken from the theoretical model. The current implementation is based on the SuSAv2 hadron tensor for the 1p1h predictions and uses the one produced with the original theoretical model for 2p2h [28] before it is parameterised within SuSAv2-MEC [16,18]. The input hadron tensors are finely binned (5 MeV in the energy and momentum transfer of the interaction) and are evaluated using an interpolation method similar to the one described in [36]. The validation of these implementations is discussed in Appendix A.
For a selected inclusive cross-section, the generation of the outgoing hadronic state in GENIE (as with other neutrino interaction event generators) is entirely factorised from the rest of the procedure. For SuSAv2 2p2h interactions this is mostly based on the methods already employed by the implementation for the Valencia group's 2p2h model in GENIE [36,37] (which also uses a global Fermi gas initial-state model). The initial state nucleon momenta are chosen by independently sampling from a Fermi gas nuclear model (as was used in the theoretical model to produce the inclusive cross section prediction) before combining these nucleons into a single 'cluster'. The energy of this cluster is then modified to account for a simple constant removal energy for each nucleon. The probability of the initial nucleons being a neutron-proton or neutron-neutron (or proton-proton in the case of incoming anti-neutrinos) pair is chosen based on the kinematics of the selected inclusive interaction using the SuSAv2-MEC theoretical model [27,28]. The four-momentum transfer from the inclusive interaction is then given to the cluster and the nucleon content is 1 The recent update (R-3 00 04) contains some fixes to the Valenica 1p1h model implementation as well as to the Deltaresonance decay simulation. These changes will not affect the new SuSA model implementations at all. Although it is not expected that the changes will dramatically affect any distribution shown here, it should nonetheless be noted that the GENIE predictions we show are not from the very latest version. changed appropriately (for example for incoming neutrinos a neutron is turned into a proton) before the cluster is decayed isotropically to two nucleons in its centre of mass frame. The two outgoing nucleons are then separately propagated through a semi-classical FSI model, simulating re-interactions inside the nuclear medium thereby altering the nucleon's kinematics and potentially stimulating additional nuclear emission (of hadrons or further nucleons) and absorption 2 . The hadron kinematics for the 1p1h model are generated with a similar methodology. However, here the removal energy of the nucleon is chosen based on a momentum-transfer dependent SuSAv2 analysis [15,39,40], which represents a first step in mitigating factorisation by correlating the hadronic initial state with the interaction kinematics. The Global Fermi-gas used for the 2p2h case is also replaced with a local Fermi gas in the current version of the model implementation. Future work will aim to replace this with a RMF spectral function. The 1p1h case also demands more thought to keep the outgoing nucleon on-shell. To do this the standard GENIE approach is taken where the outgoing nucleon's momentum is altered to satisfy its dispersion relation. Momentum is then conserved by giving the appropriate amount to the nuclear remnant.
This implementation scheme produces almost identical inclusive predictions to the input model used to calculate the hadron tensor. However, the ability of this implementation to give reasonable semi-inclusive or exclusive predictions, as with all current model implementations in neutrino interaction event generators, clearly relies on several approximations and has a lot of freedoms (for example the spectral function used, the treatment of removal energy, the FSI model or how much fourmomentum is given to the nuclear remnant). Primarily, instead of computing a fully exclusive cross-section in terms of all the particles in the final state (which would require 16 hadron tensor components to be parameterised [41]), an inclusive cross-section is modelled properly by SuSAv2 as a function of muon kinematics only and the nucleon part is added a posteriori. Here the primary interaction is factorised from both FSI and the sampling of the nucleon spectral function (i.e. both are evaluated independently of an interaction's momentum and energy transfer). Moreover the energy transfer predicted from the inclusive interaction is initially given entirely to the target nucleon(s) and none to the nuclear remnant (the impulse approximation). So, while the model is expected to describe the lepton kinematics well, there is no guarantee that the final state proton kinematics and the proton-muon correlations are properly modelled by such an approach. Despite this, as previously explained, the simulation of exclusive final states is necessary in the data analysis aiming at the measurement of neutrino oscillation parameters. As two obvious examples, we cite the correction for neutron in the neutrino energy reconstruction with the calorimetric approach used by the NOνA experiment [42] and the subtraction of proton background from neutrino interactions in the antineutrino dominated beam. Recent measurements of cross sections as a function of the outgoing muon and proton kinematics and their correlations offer the opportunity to compare such approximated simulations to data, as will be discussed in the next section.

III. COMPARISON OF SUSAV2-MEC WITH OTHER 2P2H MODELS
The SuSAv2 2p2h model is based on quite different theoretical assumptions than the other models available in GENIE: the Valencia model [5,37] and GENIE's own 'empirical' model [43]. The latter is not directly based on any microscopic calculation but is widely used by the µBooNE [44] and NOνA [3] experiments. It places a smooth contribution in the 'dip' area of invariantmass phase space (between the 1p1h and resonant peaks) amounting to around 45% of the strength of the default GENIE RFG 1p1h model. SuSAv2-MEC and the Valencia model are both based on the same fundamental RFG-based 2p2h microscopic calculation [45], but are different implementations of it. A particular difference stems from the treatment of the ∆-resonance propagator. SuSAv2-MEC implements only the real part of the pion-exchange diagrams (∆-resonance propagator) in order to avoid double counting of possible effects related to ∆-excitation effects in both 2p2h channel and the inelastic regime, while the Valencia model implements only partially the real part and partially the imaginary part, including also higher energy resonance exchange (ρ). The treatment of the ∆-resonance propagator in the SuSAv2-MEC model follows refs. [27,45], which are also used by other groups [46][47][48][49][50][51], and can be viewed as an empirical approach that provides very good agreement with (e, e ) scattering data [17,29]. Nevertheless, one could argue that there are also contributions from the imaginary part of the ∆ propagator that do not lead to real pions in the final state. Indeed, the treatment of the ∆-excitation effects is still an open question to be addressed by theoretical models as possible double-counting effects between the 2p2h channels and the inelastic regime could be considered in the analysis, depending on how the inelastic response is modelled and how the medium modification of the Delta decay width is treated. More dedicated analyses of the ∆ propagator will be addressed in further works although some preliminary results have been shown in [31] where overall no large effects are expected for T2K and MINERvA CC0π inclusive measurements ( 10%, mainly at large q 0 for a given q 3 value). Therefore, the inclusion of SuSAv2-MEC in GENIE provides a complementary addition which, crucially, has been care-fully validated using electron scattering data.
The dependence of the SuSAv2-MEC, Valencia and empirical 2p2h neutrino and antineutrino cross sections with the incoming neutrino energy is shown in Fig. 1. It can be seen that all the models differ substantially in both normalisation and shape. At higher energy part of the difference between the SuSAv2 and Valencia models stems from the fact that the latter is only available up to 1.2 GeV of momentum transfer but there are also substantial differences at lower energy as well. This different behavior is due to fundamental differences in the nuclear response functions encoded in the hadron tensors. Indeed, while the only hadron tensor element with explicit energy dependence is the V-A interference term (W 3 in the Valencia model notation in [52]), all of the hadron tensor terms have an implicit dependency on the energy because of the integration limits on q 3 ,q 0 . For a detailed view of the energy dependence of the various hadron tensors in SuSAv2 model, see [18,39].
More of the fundamental differences between the models are made evident when comparing the T2K fluxintegrated cross-section as a function of q 3 ,q 0 as in Fig. 2. Two components are clearly visible in the Valencia model: one at relatively high q 3 ,q 0 , in the region of ∆ resonance, which is related with ∆ excitation diagrams (also called ∆ pion-less decay) and a second component at lower q 3 ,q 0 , in the Quasi-Elastic kinematic region. The SuSAv2-MEC model instead predicts a single wide region of cross-section enhancement in the 'dip' region between ∆ and Quasi-Elastic kinematics. Fig. 3 shows that these starkly different model predictions are observable in experimentally accessible flux-averaged differential cross sections as a function of muon kinematics. The largest differences are visible at larger scattering angles and lower muon momentum. However, despite their notable size, such differences would be difficult to observe in any CC0π or inclusive measurement because of the large uncertainty on the 1p1h component which dominates the cross-section. More exclusive measurements, including information of the proton(s) in the final state have been performed in T2K [53] and Minerva [54] in order to enhance the sensitivity to 2p2h and will be discussed in Sec. V.
Although the microscopic 2p2h models available in GENIE are based on a predominantly inclusive calculation, they are able to predict the relative contributions of neutron-neutron (nn) -or proton-proton (pp) for incoming anti-neutrinos -and proton-neutron (pn) pairs, which are shown in Fig. 3. While the differences in the total 2p2h prediction is fairly small, it is very interesting to note the large differences observed between the SuSAv2-MEC and Valencia models when considering the relative contribution of nn and np pairs. These differences largely stem from the omission of the directexchange interference terms in the Valencia model, which are fully included in the SuSAv2-MEC model. The effect of neglecting the direct-exchange interference of the MEC matrix elements in the 2p2h channel has been shown in previous works [27,28,55], resulting in a negligible effect for pp emission but implying a reduction factor of ∼2 in np emission and thereby largely affecting the np/pp ratio. This can be observed in Fig. 3 when comparing the SuSAv2-MEC model, which fully accounts for these interference terms, with the Valencia one, in which they are absent. Since protons typically deposit much more energy than neutrons, this observation suggests that following an (anti)neutrino 2p2h interaction the SuSAv2-MEC model would produce final states that leave a substantially larger (smaller) observable calorimetric energy deposit than would be predicted with the Valencia model. This is especially relevant for neutrino oscillation experiments which use a calorimetric method of neutrino energy reconstruction, which may see a substantial alteration to neutrino energy reconstruction performance when switching models. Since the np and nn(pp) pairs have notably different hadron tensors, the different relative contributions also lead to different inclusive kinematic predictions. This difference in initial state pair predictions may also act as a signature to allow model differentiation, in particular through semi-inclusive measurements of proton multiplicity which will be discussed in section V. Complementary future measurements of neutrons, such as those which can be performed in scintillator detectors as shown in [56,57], may also prove to be a powerful probe of 2p2h.
Further comparisons of the 2p2h (and 1p1h) predictions (including comparisons with T2K data) can be found in Appendix B.

IV. 'SEMI-SEMI-INCLUSIVE' RESULTS WITH SUSAV2-RMF
Although the available 2p2h models differ substantially, inclusive measurements struggle to distinguish them due to the aforementioned dominant 1p1h contribution and the lack of a region of lepton kinematics particularly enhanced in 2p2h. This is discussed in more detail in Appendix A. However, more exclusive measurements which include information about the final state nucleons, such as those which have recently been performed by T2K [53] and Minerva [54], have been demonstrated to have a much more acute sensitivity to the different nuclear effects involved in neutrino-nucleus interactions. Unfortunately a comparison of these measurements directly to microscopic models requires semi-inclusive or exclusive predictions which the majority of models are not able to make, as they simplify their calculations by integrating over outgoing nucleon kinematics. An exception to this is the RMF model, used to construct the SuSAv2 predictions, which is capable of 'semi-semiinclusive' predictions for neutrino reactions: it is able to calculate outgoing nucleon momenta but not angles 3 . As described in Sec. II, the simulations used by experiments circumvent this limitation by factorising the leptonic and hadronic components of the interaction. Among other approximations, this approach relies strongly on a semiclassical description of FSI and the distribution of initial state nucleon kinematics seen by the probe being independent of its energy and momentum transfer.
The implementation of the SuSAv2 1p1h model in GE-NIE provides a first opportunity to test this factorisation approach. The RMF model is first used to predict an inclusive double-differential T2K flux-integrated cross section in muon kinematics and then another semi-inclusive cross section where the final state proton is below 500 MeV/c (as was measured by T2K). The same exercise is then repeated using the SuSAv2 GENIE implementation where the inclusive prediction should match the original SuSAv2 model (which is identical to RMF over a large portion of the kinematic phase space) almost exactly for low to mid angle muons, and minor differences are expected at very forward angles due to different integration and interpolation methods. The semi-inclusive SuSAv2-GENIE prediction comes from the factorisation method described in Sec. II. To help understand the different elements of the factorisation approximation, the GENIE semi-inclusive prediction is made with/without FSI and with both a kinematic dependent binding energy (as described in Sec. II) and with a fixed value of 25 MeV (around the value often used, e.g. [59]). A comparison of the inclusive and semi-inclusive results from RMF and the GENIE SuSAv2 implementation (alongside the inclusive predictions from the SuSAv2 model) is shown in Fig. 4 in a few bins corresponding to T2K measurements (although the data is not overlaid as the 2p2h and pion absorption components are not evaluated here). The full comparison is available in Appendix C.
In Fig. 4, we can observe a very good agreement between the original SuSAv2 inclusive results and its implementation in GENIE, only minor differences can be observed at very forward angles due to different interpolation and integration methods. When comparing SuSAv2 with the RMF model we observe very similar results at intermediate angles (0.6-0.9) while noticing a decrease of the RMF predictions at very forward angles and backward ones, where a small shift to low muon momentum values, i.e. large energy transfer, can also be observed. These discrepancies are both related to the implementation of RMF effects in SuSAv2: those at backward angles are from an implemented data-motivated transition to weaker FSI than in RMF at larger energy transfers, whilst those at forward angles stem from low energy transfer scaling violations in the full RMF creating difficulties in encapsulating it completely into the super-scaling formalism used. The comparison between SuSAv2 and RMF is discussed further in Appendix C.
Beyond the inclusive comparison, the 'semi-semiinclusive' predictions within the kinematic region where SuSAv2 is a good description of RMF allows us to study the validity of the factorisation approach used in event generators. Here it can be seen that the implementation with both the kinematic-dependent binding energy and with FSI is closest to reproducing the RMF microscopic model prediction, but still appears to peak at too low muon momentum and also fails to describe the higher momentum region. It can also be seen that variations to the hadronic component of the interaction cause substantial alterations to the predictions, highlighting the role of these nonphysical freedoms available within the factorisation approach. Further work will focus on more stringent tests through the implementation of the RMF spectral function into event generators and by exploring the predictions in a wider region of hadronic kinematic phase-space (ideally using a fully semi-inclusive version of RMF).
The T2K semi-inclusive CC0π measurement of interactions with protons less than 500 MeV [53] provides an opportunity to compare the RMF 'semi-semi-inclusive' model predictions to data, which is shown in Fig. 5 alongside the SuSA-GENIE predictions using the factorisation approach. In order to make this comparison the RMF predictions are added to the SuSAv2-MEC (2p2h) and pion-absorption predictions from GE-NIE (for pion production the Berger-Sehgal model was used [60]). A comparison with the T2K measurement of proton multiplicity above 500 MeV is also shown in Tab. I. In general, we observe a fair agreement of both RMF+GENIE (SuSAv2-2p2h+π-abs) and GENIE (SuSAv2-1p1h+SuSAv2-2p2h+π-abs). The overestimation of data at very forward angles by the SuSAv2-GENIE is ascribed to the aforementioned low energy transfer scaling violations absent in the SuSAv2-model but present in RMF, thereby explaining the better agreement achieved with the latter. On the contrary, the larger results from SuSAv2-1p1h at very backward angles compared to RMF are related to the previously discussed FSI treatment alterations. Although the χ 2 statistics obtained from RMF and SuSAv2 are very similar, it is clear that RMF performs better within the most forward angular bins (where additional RMF effects are most important). The fairly large χ 2 for RMF likely stems from an imperfect treatment of the higher angle bins (where it is known that FSI effects may be too strong); a shape discrepancy in the 0.9-0.94 cos θ bin and an underestimation of the data in the final muon momentum bins (which is shared by many models [53]). It can also be seen in Tab. I that, like many models, RMF and SuSAv2 in GE-NIE predict the inclusive cross-section well but then predict too few low momentum protons and too many at high momentum. Stronger nucleon FSI (lower nucleon transparancy) may help alleviate this discrepancy. Although the 'semi-semi-inclusive' comparisons with microscopic RMF predictions provides a powerful test of both the factorisation approach and the model itself, it remains difficult to draw clear conclusions regarding the 2p2h contribution. This can be explored further using full semi-inclusive measurements which can be analysed using GENIE and the factorisation approach. To do this we compare the 2p2h SuSAv2 prediction with measurements of proton and muon kinematics in Fig. 6, and in Fig. 7, as a function of the momentum imbalances between the outgoing muon and highest momentum proton in the plane transverse to the incoming neutrino (see [61] for more details of how these imbalances are defined).
In general the data-simulation agreement is fair, but it is clear from Fig. 6 that the model slightly over-predicts the number of protons above 500 GeV, particularly at more forward muon angles (suggesting the discrepancy is for more high energy neutrinos, since the interactions energy transfer must already be enough to produce the proton). This should be considered in conjunction with the slight under-prediction of the number of protons below 500 MeV, shown in detail in Fig. 5. Overall this might suggest slightly too large 2p2h strength and/or too little FSI, but within the confines of the factorisation approach it is difficult to be certain.
Similar conclusions can be drawn by analysing the the T2K measurement of transverse kinematic imbalance [53,61], which better isolates the 2p2h contribution (particularly through the transverse momentum imbalance, δp T ). The over-prediction in the δp T tail suggests the 2p2h may be too strong, but this cannot account for the simultaneous over-prediction in the bulk. As has been discussed in detail in [62], this can be explained by stronger nucleon FSI, which may also bring the total SuSAv2 prediction into agreement in the tail but could also be a product of the approximations described in Sec. II. It is interesting to note that the SuSAv2 model is able to almost perfectly describe the shape of δp T . The other transverse kinematic imbalance predictions share the normalisation discrepancy but also show a general agreement in the shape. Of particular note is the difference in the SuSAv2 and Valencia model predictions in δα T . The sharp rise at the end for the SuSAv2-1p1h prediction, in comparison to the more gradual rise from the Valencia model, seems slightly preferred by the shape of the result (the last two bins have a weak positive correlation and the rise remains present in the unregulairsed results [53]) and implies that the outgoing nucleon has a more severe deceleration from re-interactions inside the nucleus [61] for the SuSAv2 predictions, despite the same FSI model also being used for the Valencia predictions. It was confirmed that the Valencia and SuSA 1p1h models share very similar δα T predictions if nucleon FSI is disabled in GENIE. This implies that the energy-momentum transfer predicted by SuSAv2-1p1h model tends to eject nucleons with kinematics which have a larger probability of rapid deceleration in the FSI cascade. This shows that δα T can be sensitive to the inclusive interaction kinematics indirectly through FSI processes.

VI. CONCLUSIONS
The SuSAv2 1p1h and 2p2h models have been implemented in the GENIE event generator and shown to produce results consistent with the inclusive model predictions. Both the 1p1h and 2p2h model make substantially different predictions than those currently implemented and so provide an important Complementary addition. In particular, the 2p2h prediction differs substantially in the prediction of the relative number of neutrons and protons in the final state. Critically the SuSAv2 models have been well validated on electron scattering data, making this the first complete (1p1h+2p2h) implementation in GENIE to have been so.
Whilst the implemented models give reliable inclusive predictions, the exclusive and semi-inclusive predictions are based on the widely used factorisation approach which relies on: the impulse approximation; that FSI can be modelled using a semi-classical cascade; and the heavy assumption that the initial state seen by an interaction is independent of its kinematics, although this last assumption is partially mitigated by a kinematic dependent removal energy in the implementation. The implementation of a model which is also capable of 'semi-semiinclusive' predictions (RMF) has allowed us to begin to address the validity of such approximations for 1p1h interactions by comparing the predictions for a CC0π cross section with a constraint on the outgoing proton kinematics from the bare semi-inclusive model and GENIE. Here it was shown that the factorisation approximation was unable to recover the semi-inclusive model predictions, but further investigation is required to quantify the difference.
The semi-inclusive RMF prediction is then combined with GENIE's SuSAv2 2p2h and pion absorption predictions to make an estimation of the measured T2K CC0π cross section with a constraint on the outgoing proton kinematics which is free from factorisation approximations in the 1p1h. The agreement with the data seems generally better in the regions where RMF is expected to work well compared to when using the factorisation approach, demonstrating the importance of improving the treatment of hadronic kinematics in neutrino interaction event generators.
Finally we compare the new model implementation, alongside existing GENIE models, to exclusive T2K measurements sensitive to nuclear effects, including the measurement of transverse kinematic imbalance. Here we find generally fair agreement with the shape of the data but a substantial normalisation discrepancy.
The inability of event generators to reliably predict outgoing hadron kinematics represents a potentially serious issue for reaching the few-% level understanding of neutrino nucleus interactions that will be required for the next generation of long-baseline oscillation experiments. An improved treatment will require increased availability of microscopic semi-inclusive neutrino interaction predictions and their implementation into event generators.

ACKNOWLEDGMENTS
This work was partially supported by the Spanish Ministerio de Economia y Competitividad and ERDF (European Regional Development Fund) under contracts FIS2017-88410-P, and by the Junta de Andalucia (grant No. FQM160). GDM acknowledges support from a Junta de Andalucia fellowship (FQM7632, Proyectos de Excelencia 2011) and a P2IO-CNRS grant. We acknowledge the support of CEA, CNRS/IN2P3 and P2IO, France; and the MSCA-RISE project JENNIFER, funded by EU grant n.644294, for supporting the EU-Japan researchers mobility.
Appendix A: Comparison to T2K CC0π inclusive analysis and implementation validations Fig. 8 shows a comparison of the SuSAv2 1p1h and 2p2h calculation (in GENIE and directly from the model) on top of the GENIE Berger-Sehgal pion absorption contribution to T2K CC0π inclusive results [63], which are in good agreement with the data. As has been shown in in Fig. 4, the slight discrepancies in the very forward going bins can be improved by using the full RMF. As suggested in Sec. IV, it is clear that it is difficult to draw detailed conclusions regarding the 2p2h contribution as the 1p1h in entirely dominant.
Importantly is can also be seen that there is in general very good agreement between the full SuSAv2 1p1h and 2p2h calculations and their implementations in GENIE. The remaining differences in the 1p1h channel stem from interpolation and integration method differences. Whilst these also affect the 2p2h case, the largest difference here stems from the SuSA group's use of a parametrisation of the microscopic model in order to speed up calculations, which is not necessary in the GENIE implementation. To validate that this is the primary source of the small differences observed, the SuSA 2p2h parametrisation was used to build a hadron tensor which was then implemented into GENIE. Fig 9 shows the total crosssection predictions from the SuSA 2p2h model alongside the implementation in GENIE using the hadron tensor taken directly from the microscopic model or taken from the parametrisation. From this it can clearly be seen that GENIE is able to match the SuSA 2p2h when using the hadron tensor taken from their parametrisation and that small differences exist when using the hadron tensor from the full microscopic model. Small differences remain due to the aforementioned integration and interpolation methods (particularly for anti-neutrinos), but these are fairly small.
Appendix B: Further comparisons to the GENIE-Valencia model predictions Fig. 10 and 11 show a comparison of the SuSAv2 and Valencia model predictions (1p1h and 2p2h) as implemented in GENIE on top of GENIE's Berger-Sehgal pion production prediction for T2K inclusive and 'semi-semiinclusive' CC0π results. This clearly shows that the implemented Valencia and SuSA models differ substantially, with only the SuSA model able to describe the very forward data and the Valencia model describing the midangle data a little better. The discrepancies between the model and data is consistent between the inclusive and semi-inclusive results, suggesting that they at least partially stem from the underlying inclusive cross section model.
Appendix C: Full phase-space test of the factorisation approach Fig. 12 shows the full phase-space equivalent of Fig. 4, which, as discussed in Sec. IV, serves as a preliminary first test of the factorisation approach used to extract semi-inclusive predictions from inclusive model implementations in neutrino-nucleus interaction generators. In addition to this, these plots also compare the RMF and SuSAv2 inclusive model predictions. Although the differences were briefly discussed in section IV, here more detail is provided.
Firstly, a very good agreement between the RMF and SuSAv2 inclusive model predictions can be seen at intermediate angles (0.6-0.94), differing by less than 1% for the total cross section integrated over this region. The discrepancy in the backward region is due to a correction in SuSAv2 to account for RMF having too strong FSI in the high momentum transfer region. Here the outgoing nucleon carries a large kinetic energy and it would be ex-pected that the FSI effects should be suppressed for such kinematics. However, this does not happen in the RMF theory due to the strong energy-independent scalar and vector potentials included in the model. In order to account for this effect, the SuSAv2 model introduces effects from the Relativistic Plane Wave Impulse Approximation (RPWIA) -where the initial state is described by a mean field but FSI are neglected -at high momentum transfer by using a q-dependent blending function, as detailed in [15,17]. This effect is fully incorporated into the GE-NIE implementation. In further works [64], an improved RMF model with energy-dependent potentials will solve this issue, making the SuSAv2 model more self-contained and avoiding the need of using RPWIA effects to properly describe high kinematics.
The differences observed at very low kinematics, i.e. very forward angles, are related to the RMF scaling functions employed in the SuSAv2 model. These scaling functions effectively describe the nuclear dynamics of the model and are almost identical for q 400 MeV/c and for different nuclei ('superscaling') [19,20]. However, this scaling behavior is broken at very low q (< 400 MeV/c) where collective effects which violate superscaling dominate. These effects are indeed accounted for in the RMF theory (producing smaller scaling functions at very low q) but are absent in the SuSAv2 approach (which assumes a general scaling function for all kinematics), producing larger SuSAv2 results at very forward angles. This drawback of the SuSAv2 model will be addressed in further works by considering the q 3 -dependence of the RMF scaling functions, which will produce more consistent theoryvs-data comparison at these particular kinematics. Accordingly, a full implementation of the upcoming RMF energy-dependent model [64] in generators will solve this drawback at very low and high kinematics.  Comparison of muon-neutrino single differential 1p1h cross sections on Carbon at T2K kinematics as a function of the muon kinematics as both an inclusive and a 'semi-semi-inclusive' cross section, the latter applying a restriction that there are no protons with momenta above 500 MeV. In the inclusive case the RMF, SuSAv2 model and SuSAv2 GENIE implementation are compared. In the semi-semi-inclusive case the RMF prediction is compared to those of GENIE using the implemented SuSAv2 model and the factorisation approach. The latter is split depending on whether an FSI cascade was applied and whether the nuclear removal energy is fixed or kinematic dependent (as described in Sec. II). All angular bins are shown in Appendix C.      Fig. 4. A comparison of muon-neutrino single differential 1p1h cross sections on Carbon at T2K kinematics as a function of the muon kinematics as both an inclusive and a 'semi-semi-inclusive' cross section, the latter applying a restriction that there are no protons with momenta above 500 MeV. In the inclusive case the RMF, SuSAv2 model and SuSAv2 GENIE implementation are compared. In the semi-semi-inclusive case the RMF prediction is compared to those of GENIE using the implemented SuSAv2 model and the factorisation approach. The latter is split depending on whether an FSI cascade was applied and whether the nuclear removal energy is fixed or kinematic dependent (as described in Sec. II).