ACHILLES: A novel event generator for electron- and neutrino-nucleus scattering

We present a novel lepton-nucleus event generator: ACHILLES, A CHIcagoLand Lepton Event Simulator. The generator factorizes the primary interaction from the propagation of hadrons in the nucleus, which allows for a great deal of modularity, facilitating further improvements and interfaces with existing codes. We validate our generator against high quality electron-carbon scattering data in the quasielastic regime, including the recent CLAS/e4v reanalysis of existing data. We find good agreement in both inclusive and exclusive distributions. By varying the assumptions on the propagation of knocked out nucleons throughout the nucleus, we estimate a component of theoretical uncertainties. We also propose novel observables that will allow for further testing of lepton-nucleus scattering models. ACHILLES is readily extendable to generate neutrino-nucleus scattering events.


I. INTRODUCTION
Interactions between leptons with nuclei at beam energies in the 10 MeV−10 GeV range are key to properly interpret a multitude of experiments, e.g. those that probe neutrino oscillations [1][2][3][4], electron-nucleus scattering [5][6][7][8][9][10], dark matter and dark sectors or even muon-specific new gauge forces [11][12][13][14][15][16]. Nevertheless, modeling these interactions with the percent-level precision required by experimental analyses [3] is a formidable challenge. These difficulties are primarily due to nonperturbative nuclear dynamics, which play an important role in both the hard interaction vertex and in the propagation of the struck nucleons before they exit the nucleus .
Accurately modeling the hard-scattering cross section between a lepton and a nuclear constituent presents nontrivial difficulties. The individual couplings between leptons and nucleons are parametrized in terms of singlenucleon form factors, which depend on the momentum transfer associated with the process and involve nonperturbative QCD dynamics. These form factors can be either calculated from first principles using lattice QCD or fitted to experimental data (see e.g. Refs. [47][48][49][50][51] and references therein). Whichever method is used, care must be taken in properly estimating uncertainties, especially in the axial sector where experimental data are scarce. In addition, scattering does not occur on a collection of free nucleons (which could be described trivially within a Fermi-gas model). Instead, the real-world target nucleus is a correlated quantum many-body system; hence, a percent-level theoretical description of scattering off of a bound nucleon must capture many-body correlation effects within a well-defined factorization scheme like the impulse approximation and its generalizations (see e.g. Refs. [52][53][54][55][56][57]).
The final-state interactions (FSI) that the struck nucle-ons undergo before exiting the nucleus are also extremely complex phenomena [17][18][19][20], subject to non-perturbative single and many-nucleon effects. Examples of the former are nucleon excitations into a ∆ isobar, which quickly decays into a pion-nucleon state. The latter include correlations induced by realistic two-and threenucleon forces. A fully quantum mechanical description of these processes presents an exponentially hard computational problem. Sophisticated nuclear many-body methods leverage leadership-class computing resources to tackle this real-time nuclear dynamics problem but are limited to inclusive processes and to the non-relativistic regime [58,59]. Exploratory calculations on quantum devices show promise [60,61] for treating fully-exclusive processes. However, the inclusion of relativistic effects poses non-trivial challenges, and their application to realistic systems seems to remain a distant goal.
Over the years, a number of complementary methods have been developed to capture the leading effects of FSI. The most sophisticated ones start from the Kadanoff-Baym integro-differential equations for the evolution of the entire nuclear system. In practice, state-of-the-art transport codes solve truncated versions of these equations, which in turn make a proper estimation of theory uncertainties much harder [25,26,28,37,43,62]. On the other hand, intranuclear cascade (INC) approaches approximately solve the transport equation by evaluating the collision term stochastically. The main approximation of INC models is that of classical propagation between consecutive quantum-mechanical scatterings. Hence, they are applicable in the regime in which the de Broglie wavelength of the nucleons is much smaller than the range of the interaction, which is in turn smaller than the average distance between nucleons [63]. Hence, the applicability of INCs is in principle restricted to nucleons with kinetic energies above ∼ 200 MeV, although many observables in heavy-ion reactions at less than 100 MeV are reproduced well in practice [40,64].
Besides the aforementioned intrinsic limitations, INCs involve a number of additional model-specific prescriptions. For instance, the kinematic variables of a struck nucleon are drawn from a model of the nuclear groundstate; typically the (local or global) Fermi gas or realistic spectral functions. In addition, some assumptions for the propagation in the nuclear medium are made. Examples of the latter are the use of mean free path estimates obtained from the nuclear density or propagating nucleons as if they were "hard-spheres" with a radius that is proportional to the square root of the total nucleon-nucleon cross section. Finally, corrections due to the nuclear environment are also typically included by means of average nuclear potentials and imposing Pauli blocking in nucleon-nucleon collisions.
To assess the reliability of such effective descriptions, INCs should be systematically and extensively validated against available experimental data. Electron-nuclei scattering experiments offer large, high-quality data samples including a broad range of experimental observables, which can be used both to benchmark different INC models and to gauge the reliability of their assumptions. Due to the interplay of all these effects, modeling leptonnucleus interactions constitutes a remarkable challenge. Nevertheless, state-of-the-art neutrino event generators have only recently started comparing their predictions to electron scattering data [45,46,[65][66][67][68][69]. The results of these initial comparisons reveal that existing event generators do not describe electron-nucleus scattering data, and consequently neutrino-nucleus scattering data, to the precision level required by next-generation experiments, such as DUNE [70,71].
In this paper we take a first step towards a fullfledged lepton-nucleus event generator: ACHILLES, A CHIcagoLand Lepton Event Simulator. Three aspects of ACHILLES' design are worth highlighting explicitly. First, as neutrino physics enters the precision era, we expect that event generators will need to incorporate many technical improvements and new physical insights. The need for robust, quickly extensible codebases is therefore acute, presenting a challenge not only in scientific computing but also in software engineering. To help solve this inherent difficulty, one of the core design principles of our code is modularity. We have endeavored to divide the code clearly into individual pieces that describe the different physics processes within lepton-nucleus scattering. Examples of the constituent parts include the description of the initial state of the nucleus; the "hard scattering" between the lepton and constituents of the nucleus; the intranuclear cascade process; and the nuclear potential. The advantage of modularity is obvious: an improvement on a specific part of the code requires minimal effort, since all parts are independent. Wellimplemented modular design thus facilitates future improvements and keeping abreast with new advances in the description of lepton-nucleus scattering. We have also tried to provide a user-friendly interface to available tools for physics beyond the Standard Model (BSM), such as the recent lepton-tensor interface developed by some of the current authors [72]. The ultimate goal is to enable ACHILLES users to proceed seamlessly from writing down a BSM Lagrangian to generating events.
Second, the physical structure of the scattering problem also imposes important constraints.
Neutrinonucleus scattering involves both vector and axial form factors (associated with one-and two-body nuclear electroweak current operators), while electron-nucleus scattering is dominated by vector form factors from photon exchange. Thus, any neutrino event generator should be able to describe electron-nucleus scattering data as a special case of the more general problem. Therefore, the benchmarking foundation for any lepton-nucleus event generator must be extensive validation against inclusive and semi-inclusive electron-nucleus scattering data. The present paper is an attempt to begin laying this foundation for ACHILLES.
Third, to leverage the existing analysis tools developed and maintained by the high energy event generators at the LHC, we adopt the HepMC3 output format [73]. This format allows easy interface with analysis tools such as Rivet [74,75] and eventually Nuisance [76], saving valuable research time for users to focus more on physics and less on coding. The HepMC3 output format permits arbitrary parameters to be added to an event, allowing additional event information required by neutrino experiments to be included in a simple and straightforward manner.
This paper constitutes the first step towards the full development of ACHILLES. We perform a comparison between our model of electron-nuclei interactions, including the INC model developed in Ref. [45], against the recent reanalysis of electron-carbon scattering data by the CLAS/e4v collaboration [69]. Compared to our previous work, and to obtain more realistic results for exclusive observables, we implement a nuclear potential and simulate the propagation of nucleons within this potential. We focus on the inclusive quasielastic (QE) cross section, which is better understood than other cross section channels at the energies of interest [77,78]. We also consider the angular dependent proton yield, as well as a few other kinematical observables in the QE regime.
The comparisons performed here test the modelling of the cross section; the impact of initial-state nuclear configurations, particularly those obtained via quantum Monte Carlo methods [79] and the roles of the spectral function, in-medium modifications, and Pauli blocking effects. Taken together, these comparisons provide valuable insight to the physics of intranuclear cascades. Besides these comparisons, we also propose new observables that may help in further testing models of electron-nuclei and neutrino-nuclei interactions. All proposed observables can be readily extracted from existing CLAS data.
The paper is organized as follows. Sec. II lays out general considerations for lepton-nucleus scattering. Sec. III compares the results of ACHILLES to experimental data on inclusive electron-nucleus scattering. Sec. IV A describes in medium effects from the nuclear potential. Sec. V provides comparisons to exclusive data. Sec. VI proposes novel observables to further test the interaction modelling, following by conclusions in Sec. VII.

II. GENERAL LEPTON-NUCLEUS SCATTERING
The general expression of the differential cross section for a scattering process involving a target nucleus A and a lepton leading to a given final state reads We denote the initial-and final-state momenta by where the index f refers to all the possible hadronic and leptonic final state particles. The first term in parenthesis is the flux of incoming particles, with v ,A being the velocities, the second encodes the matrix element, and the last line is the phase space for the outgoing particles.
In the one-boson exchange approximation, the squared amplitude reads where P is a generic vector boson propagator, while L and W denote the leptonic and hadronic tensors, respectively. The leptonic tensor is completely determined by the leptonic process (e.g. neutrino charged-current interactions). The hadronic tensor on the other hand contains all information on nuclear dynamics and it is expressed as where Ψ 0 and Ψ f denotes the hadronic initial and final states, respectively. Carrying out the full calculation of the manybody wave function and its real-time evolution is an exponentially-hard computational problem. To handle it, the reaction process is modeled by separating the primary interaction vertex from the propagation of the struck particles out of the nucleus. Schematically, this division can be expressed by considering the full matrix element squared as: where the {k}({p}) is the set of all initial(final) state particle momenta, V represents the primary interaction vertex producing intermediate particles with momenta {p }, and P denotes the time evolution of the intermediate states to the final states outside the nucleus. Calculating this equation exactly requires retaining full quantum mechanical interference between the primary interaction vertex and the subsequent re-interactions. Traditionally, due to the complexity of solving Eq. (7) exactly, the calculation factorizes the two-step process as an incoherent product: This treatment is similar to the approach taken by the collider community when dressing hard-scattering cross sections with parton showers (see e.g. Ref. [80] for a pedagogical discussion). By construction, this approximation neglects inference between primary interaction vertices that give rise to identical final states while leaving inclusive observables unaffected. It is expected that these interference effects are subdominant, and a detailed investigation is left to a future work. In ACHILLES, the subsequent evolution probability |P| 2 is handled semiclassically using the algorithm developed in Ref. [45]. Eqs. (7) and (8) retain full generality for lepton-nucleus scattering, but implementing them in a concrete calculation requires several choices about the relevant degrees of freedom. First, one must specify the initial-state nuclear constituents {k} which participate in the vertex V. This question is closely related to the choice of a factorization scheme, to which the following section is dedicated. Second, one must specify the intermediate-state particles which can appear, either from production at the primary interaction vertex V or in the system's subsequent evolution P. Briefly stated, the present work restricts to processes in which protons and neutrons are the only active degrees of freedom. This choice explicitly neglects, e.g., pion production at the primary interaction vertex. Under this ansatz, the electroweak current of Eq. (6) is expanded as a sum of one and two-nucleon operators Three-and higher-body terms have been found to be small [81] and are thus neglected here. Generalizing these expressions to other mediators, such as scalars, is straightforward.

A. Factorization scheme
As mentioned above, this work focuses on a lepton scattering on a nucleus in the quasielastic regime, in which the dominant reaction mechanism is assumed to be single-nucleon knockout: + A → + X where and denote the initial and final lepton states, A is the target nucleus, X is the hadronic final state, which for example can be composed of a single emitted nucleon and the remnant nucleus. We can rewrite the cross section of Eq. (1) for a 2 → 2 scattering process in a background as where again p refer to final state momenta, with X denoting the hadronic final state, and the factor 1/|v A −v | = 1 in the lab frame (neglecting the lepton mass). Note that, for convenience, we absorbed the factors 1/2E in in the hadronic (leptonic) tensor, and we are also embedding the delta function in the hadronic tensor. For large enough values of the momentum transfer, the virtual boson primarily interacts with individual bound nucleons, so that the hadronic final state can be approximated by the factorized expression where |p is a plane wave describing the propagation of the final state nucleon with momentum |p|, while |Ψ f A−1 denotes the (A − 1)-body spectator system, which can be either in a bound or unbound state. In addition, since we are focusing on the primary interaction vertex, we have dropped the prime in the intermediate variables.
Retaining the one-body current contribution only in Eq. (9), the incoherent contribution to the hadronic tensor is given by In the previous equation, denote the energy and momentum of the initial and final nucleons, respectively. Note that, for brevity, we have suppressed the subscript h in the bras and kets. The spectral function yields the probability distribution of removing a "hole" nucleon h ∈ {p, n} with momentum k h from the target nucleus, leaving the residual (A − 1) system with an excitation energy E , and it is defined as [82] where the sum runs through the possible final states of the A − 1 spectator nucleons, which can either be bound or in the continuum.
The spectral function of finite nuclei is generally expressed as a sum of a mean-field and a correlation contribution. The first one describes the low momentum and removal-energy region, and is associated with the residual A − 1 system being in a bound state. The correlation contribution includes unbound A − 1 states for the spectator system, in which at least one of the spectator nucleons is in the continuum, and it provides strength in the high momentum and energy region. The nuclear spectral function has been evaluated within different semi-phenomenological [83,84] and ab-initio many-body methods [85,86], including quantum Monte Carlo [87].
The one employed in this work has been obtained within the correlated basis function theory of Ref. [83]. The low momentum and energy contribution is determined by adjusting mean-field calculations to reproduce (e, e p) scattering measurements. The correlation part is derived within the Local Density Approximation by convoluting the correlation component of the spectral function obtained within the correlated basis function theory for isospin-symmetric nuclear matter for a given value of the density. Additional details regarding the spectral function and, in particular, corrections to the impulse approximation stemming from final-state interactions appear in Sec. III B.
The spectral function is normalized as where Z denotes the number of protons in the nucleus. After applying the factorization ansatz to the hadronic final state, the phase space factor of Eq. (10) can be rewritten as where the discrete sum over the states of the remnant nucleus is embedded in the spectral function as shown in Eq. (13). A complete estimate of the theoretical uncertainty associated with the cross section calculation would require assessing the error in the many-body calculation of the spectral function, the inputs used to describe the interaction vertex (i.e. couplings, form factors), and the factorization of the hadronic final state. Achieving this goal is highly nontrivial and has not been included in this work but future developments are discussed in Sec. VII.

A. Theoretical preliminaries
Now we proceed to the concrete calculation of the electron-nucleus cross section, which will be the basis of all comparisons between ACHILLES and electron-carbon scattering data. We focus first on comparisons between our theoretical predictions and experimental data for the quasielastic inclusive electron-12 C cross section using the aforementioned factorization scheme and spectral function formalism. For those kinematics in which FSI are expected to be negligible, the inclusive cross section provides a benchmarking test for the model of the primary interaction (|V| 2 in Eq. 8), since this observable is unaffected by the semi-classical propagation in the nuclear medium, and hence by the INC.
The inclusive double differential cross section for the scattering of an electron on an at-rest nucleus via onephoton exchange is written as (see Eqs. (2-4) for notation) where α 1/137 is the fine structure constant, and Ω e is the scattering solid angle in the direction specified by p e . The energy and the momentum transfer are denoted by ω and q, respectively, with Q 2 = −q 2 = q 2 − ω 2 . The lepton tensor is fully determined by the lepton kinematic variables and, neglecting the electron mass, it is given by The one-body electromagnetic current operator entering Eq. (12) is written as where the isoscalar (S) and isovector (V) form factors, F 1 and F 2 , are given by combination of the Dirac and Pauli ones, F 1 and F 2 , as τ z is the isospin operator, and The Dirac and Pauli form factors can be expressed in terms of the electric and magnetic form factors of the proton and neutron as with τ = Q 2 /4m 2 N . Therefore, the electromagnetic current can be schematically written as j µ 1b,EM = j µ γ,S + j µ γ,z where the first is the isoscar term and the second is the isovector multiplied by the isospin operators τ z . The above set of equations can be readily extended to the electroweak case and higher multiplicity processes; an automation for arbitrary leptonic tensors was developed in [72].
The use of a realistic spectral function combined with a factorization scheme has proven to reproduce a large fraction of the available electron scattering data (see Ref. [88] and references therein). Over the past few years, the factorization scheme has been extended to account for two-nucleon currents and pion-production mechanisms [57,[89][90][91][92][93]. The focus of the present work is the quasielastic region, and we leave the implementation of additional channels to a future work.

B. Comparison to data
The first comparison between ACHILLES and data can be found in Fig. 1 [96]. In all four cases, the first peak, which is dominated by quasielastic scattering, is quite well described by ACHILLES. Note that meson-exchange currents provide additional strength in the dip region between the quasielastic and the resonance peak [77]. The second peak has large contributions from resonance production, a mechanism which has not yet been implemented in ACHILLES, and therefore it is not expected to be reproduced by the present version of the code.
Given the large Q 2 values of the data displayed in Fig. 1, FSI between the struck nucleon and the remnant nucleus are expected to be small and have been neglected in the initial hard interaction. For kinematics in which the factorization scheme is not expected to hold, different approaches have been developed to account for quantummechanical effects in FSI in the quasi elastic region. To correct the factorization scheme and spectral function results, the real part of an optical nuclear potential [97] is added to the free energy spectrum of the outgoing nucleon and the cross section is convoluted with a folding function to account for rescattering effects [52,98]. In the Relativistic Mean Field approach, FSI between the outgoing nucleon and the residual nucleus are accounted for by solving the associated Dirac equation using the same mean field as used for the bound nucleon [99]. Including these corrections modifies the inclusive cross section, shifting the quasi elastic peak to lower energy transfers and redistributing the cross section strength to the highenergy-transfer tail [100].
However, the aforementioned approaches do not allow for an accurate treatment of exclusive processes. In this regard, INCs are a common tool [31-33, 35, 38, 41, 42, 45, 46, 101] for modeling the total hadronic state that escapes the nucleus after the hard interaction vertex, as described by Eq.  [95]. Bottom left: Scattering with an incoming energy of 1300 MeV at an angle of 37.5 • , data is from [95]. Bottom right: Scattering with an incoming energy of 2500 MeV at an angle of 15 • , data is from [96].
tional scattering occurs. This probabilistic treatment, at least in current algorithms, neglects interference effects. Therefore, by definition INCs leave inclusive observables, such as the differential cross section displayed in Fig. 1, unchanged.
It is important to note that the FSI modeled by folding functions and the FSI modeled by INCs arise from the same physics. The major differences between the two approaches are the approximations used to include the imaginary part of the nuclear potential. Folding functions account for the effects of FSI including interference effects at the cost of integrating out information on the final state nucleons. On the other hand, INCs capture the exclusive final state nucleons at the cost of neglecting the interference effects. Therefore, combining the two calculations in a single code results in effectively double counting the imaginary part of the nuclear potential. Implementing interference effects into INCs is beyond the scope of this work.
Having established that our interaction model of quasielastic interactions (i.e., V in Eq. 8) reproduces the experimental data, we next move to comparisons with exclusive observables.

IV. INTRANUCLEAR CASCADE
Simulating the propagation of the nucleon involved in the hard scattering out of the nucleus is a vital component of a neutrino event generator. Intranuclear cascade models are a class of algorithms used to reproduce the imaginary part of the nuclear potential using stochastic Monte Carlo methods. Traditional techniques do not capture the quantum mechanical components involved in this process. Recently, a new technique for intranuclear cascades has been proposed in Ref. [45] to begin capturing these effects. Figure 2 shows the programmatic flow of our cascade model.
In this algorithm, the spatial distribution of neutron and protons are sampled from nuclear configurations obtained from QMC calculations fully retaining correlations effects. Their initial momentum is generated according to a local Fermi gas model. Once the target and the projectile are initialized, the particles are propagated using relativistic kinematics. In the simplest approximation, these particles follow straight-lines trajectories, but an option to bend these trajectories using nuclear potentials is discussed in the next section. We follow a time-like approach for the propagation; at each step of the propagation δt we check if an interaction occurred according to the nucleon-nucleon scattering cross section using either a Gaussian or cylindrical probability model depending on the impact parameter. Originally, the only in-medium effect was taken to be the Pauli principle, and below we discuss updates using the nuclear potential. We keep two separate lists of "propagating" and "spectators" particles. At the beginning of the event, the projectile is the only propagating particle. Afterwards, each particle that has collided with a spectator is promoted to a propagating one, while all the others are still labeled as spectators. The particles are propagated until they reach the surface of the nucleus where the nucleon is either recaptured or escapes, based on its energy.

A. Nuclear potential
In electron-nucleus and neutrino-nucleus scatterings, inclusive quantities may be well described without detailed modelling of what happens when nucleons are propagating out of the nucleus. The description of exclusive quantities is more demanding. While nucleonnucleon interactions are possibly the most important effects to include in an INC model, the presence of a meanfield nuclear potential may trap struck nucleons or deflect their trajectory, effectively changing the number, momentum and direction of outgoing particles. To account for this effect, we have implemented two different options as a background potential, which depends on both the position r and momentum p of the propagating nucleon. Note that, in our approaches, only the real part of the potential is included, since the imaginary part is captured by the hard scattering in the intranuclear cascade. A similar approach of including the potential into cascades was studied in Ref. [102]. The first potential considered is a non-relativistic potential defined by a three-parameter fit to single-particle energy of infinite nuclear matter [103], which is consistent with the variational ground-state calculations of Wiringa, Fiks, and Fabrocini (WFF) [104]. Its functional form is given as where p is the modulus of the three momentum of the propagating nucleon, while α, β, and Λ are fit to reproduce the single-particle energy of nuclear matter as ob-tained from the Urbana v 14 + TNI Hamiltonian, and ρ(r) is the local nuclear density at radius r. The values of the aforementioned variables are where ρ 0 = 0.16 fm −3 is the saturation density of nuclear matter.
The other potential we adopted is based on the work of Ref. [97] where proton-nucleus elastic and reaction cross section data are fitted to determine global proton-nucleus optical potentials for energies between 20 and 1040 MeV for several nuclear targets, including carbon. The fitting can be done with potentials in a Dirac equation or Schroedinger equation. For the former case, the Dirac equation was used in the form where V c (r) denotes the Coulomb potential at a given nuclear radius r, which is either computed from Woods-Saxon-like charge distribution [105] or taken from data when they are available, and E is the energy of the propagating nucleon. The quantities determined by the fitting procedure are U s (r, E ) and U 0 (r, E ), the scalar and vector optical potentials, respectively; they include a real and an imaginary part. To obtain an effective optical potential for the Schroedinger equation, it is helpful to write down a standard reduction of the Dirac equation to second order form. The equation for the upper two components is where S and L are the total spin and angular momentum of the nucleus, respectively. We can identify U eff and U SO as effective Schroedinger-equation central and spin-orbit potentials that can be constructed from U s (r, E ) and U 0 (r, E ). Note that the Schroedinger-equation central potential also includes the Darwin term accounting for relativistic corrections, and its effect is more pronounced in the nuclear interior [106]. The spin-orbit term is significantly smaller than the central one, and for this reason it has been neglected in the present work. In the remainder of this paper, we will denote the potential obtained from Ref. [97] retaining only the central contribution as the Schroedinger potential. There are two ways that the potential plays a role within the cascade algorithm. Firstly, the potential modifies the hard interactions that occur between nucleons, often referred to as in-medium modifications in the literature. In this work, we only consider the non-relativistic in-medium corrections as implemented in Ref. [107]. To account for in-medium corrections due to the nuclear potential, we modify the differential cross section using where p 1 , p 2 are the momenta of the incoming propagating nucleons, and p 3 , p 4 are the momenta of the outgoing propagating nucleons. The effective nucleon mass m * is given as This in-medium correction approximates the in-medium matrix element to be the same as the free matrix element, and that U (p 1 , ρ) + U (p 2 , ρ) ≈ U ( (p 2 1 + p 2 2 )/2, ρ). We leave the expansion to the relativistic case to a future work. Note that we assume the potential to remain the same regardless of INC dynamics. While this is certainly an approximation that will fail when the nucleus suffers a "hard" breakdown, it should be reasonable when the number of exiting nucleons is much lower than the number of a nucleons in the nucleus.
We also consider the long-distance effect of a background potential on the nucleon as it propagates through the nucleus. We simulate a particle propagating by classical Hamiltonian evolution of the system. The equations of motion can be written as The equations above are clearly a set of coupled differential equations. In order to maintain conservation of energy, a symplectic integrator is used for the evolution.
Since these differential equations are coupled, traditional symplectic integrators will not work. It was shown in Ref. [108] that it is possible to use symplectic integrators by working with an augmented Hamiltonian in an extended phase space. App. B provides technical details.

V. COMPARISON WITH EXCLUSIVE OBSERVABLES
We now proceed to the analysis of exclusive observables in electron-carbon scattering. Exclusive quantities are particularly relevant for neutrino experiments, especially those based on the liquid argon time projection chamber (LArTPC) technology such as the SBN detectors [109] or the future DUNE experiment [3]. LArTPCs are able to identify and reconstruct tracks of all charged particles in a neutrino scattering event in exceptional detail. This capability allows these detectors to reject backgrounds and optimize searches more efficiently. If we take as an example the recent MicroBooNE search for single photons [110] as an explanation of the Mini-BooNE low energy excess [111], we can appreciate the importance of exclusive quantities: the one-photon-zeroproton sample has a background rate 7 times higher than the one-photon-one-proton sample, and this can largely be attributed to the inability to reconstruct the ∆ → N γ invariant mass in the absence of a proton track. Several other examples can be made, but the point is that describing correctly exclusive observables will be crucial in current and future neutrino experiments.
The CLAS and e4v collaborations have recently reported a study of energy reconstruction in electronnucleus scattering data, using methods employed in neutrino experiments [69]. The collaborations analyzed electron scattering data taken with CLAS at JLab for three different beam energies: 1.159, 2.257, and 4.453 GeV. The detection thresholds for hadrons were similar to thresholds at current and future neutrino experiments. The analysis focused on the reconstruction of several exclusive and differential quantities, such as incoming electron energy reconstruction for 0π events, calorimetric reconstructed energies for 1p0π events, transverse variables, proton multiplicity, and so on.
In what follows, we describe the comparison between our generator and CLAS data, for all available observables, focusing on quasielastic electron-carbon scattering. The CLAS/e4v collaborations have reweighted their data by a factor Q 4 /GeV 4 , where Q 2 = −q 2 is the fourmomentum transfer which can be obtained with final and initial electron kinematics. This was done to have a better comparison with neutrino events: at these energies, while electron-nucleus scattering is dominated by photon exchange, neutrino-nucleus scattering can be very well approximated by a four-fermion interaction. Here, we do the same in an event by event basis.
We adopt the same CLAS acceptances and mimic the energy resolution as described in Ref. [69], after corrections for undetected particles. The electron and proton energies are smeared by 1.5(0.5)% and 3(1)% for the 1.159 (2.257 and 4.453) GeV beams, respectively. Protons were detected with momentum p p > 300 MeV and angle with respect to the beam direction 12 • < θ p . Electrons were detected with energy E e > 0.4, 0.55, and 1.1 GeV for E beam = 1.159, 2.257 and 4.453 GeV, as well as angles with respect to the beam direction where p e is the electron momentum, i = 1, 2, 3 refers to the three beam energies in increasing order, θ i 0 = 17 • , 16 • , 13.5 • and θ i 1 = 7 • , 10.5 • , 15 • . Since we do not simulate production and propagation of pions, we do not list their acceptances here.
We start with the double differential cross section where Ω e is the solid angle and E e is the outgoing electron energy, as a function of the energy transfer ω ≡ E in e − E e , for fixed outgoing electron angle of 37.5 • with respect to the beam axis and beam energy E in e = 1.159 GeV, see Fig. 3. Hereafter we present ACHILLES results for several different variations on the implementation of the INC, namely, the nucleon-nucleon interaction model (Cylinder vs. Gaussian, see Ref. [45]) and the real part of the nuclear potential (WFF, Schroedinger or none). Here "none," is used as a baseline prediction for the model described in Ref. [45]. Different treatments will be color coded and indicated by an inset in all figures. The spread among the lines can be interpreted as one of the theoretical uncertainties on the lepton-nucleus interaction modeling. Inclusive observables, such as those displayed in Fig. 3 are not affected by the semi-classical intranuclear cascades. Therefore, the different lines lie on top of each other.
At low energy transfer, quasielastic scattering dominates the cross section. In this region, particularly for 0.1 < ω < 0.4 GeV, our generator describes the data fairly well, except for the small-energy region where the theory underestimates the data. The agreement with data would be improved by the interference effects neglected in intranuclear cascades (as discussed at the end of Sec. III B), yielding an enhancement of the strength at low ω. However, a naive combination of our intranuclear cascade and a folding function would result in a double-counting for exclusive observables, as discussed above. For this reason it has not been included in our calculation.
The missing strength in the quasielastic region and towards higher energy transfers is largely ascribed to meson exchange, resonance production and deep inelastic scattering contributions currently neglected in our analysis [77]. As one goes beyond this region towards higher energy transfers, the quasielastic contribution shrinks and one expects other components of the cross section to be more relevant, which explains the discrepancy between the data and our generator. Overall, this level of agreement is an encouraging result.
Another comparison we make is on the lepton energy reconstruction assuming quasielastic scattering. The quasielastic energy reconstruction is done based off the methodology used by water Cherenkov detectors, such as MiniBooNE and T2K. In this case, only charged leptons and pions are measured. Assuming that the neutrino scatters quasielastically from a stationary nucleon within a nucleus, its incoming energy can be reconstructed as where m N is the mass of the nucleon, is the average nucleon separation energy (we use 21 MeV for carbon), E (p ) is the energy (momentum) of the outgoing lepton, and θ is the angle of the outgoing lepton with respect to the beam axis. The different scheme choices discussed in this paper are compared to the measured E QE distribution for a 1.159 GeV electron beam on carbon from the CLAS data [69] in Fig. 4.
Here the peak around the beam energy is dominated by the quasielastic contribution, while the tail towards lower values of E QE is dominated by meson exchange currents and resonance production. Therefore, we only expect our results to approximately reproduce the peak, which is what is shown. The agreement with the data for larger values of E QE is likely to be improved by the interference effects neglected by intranuclear cascades. However, a more detailed analysis of the discrepancy will be carried out in the future when meson exchange currents are included in ACHILLES. Analogously to Fig. 3, this distribution has no information about the outgoing protons contained within it. Therefore, we expect that the prediction should be insensitive to the cascade parameters, as can be seen in the small spread of the colored lines.
In liquid argon time projection chamber experiments, such as MicroBooNE and DUNE, the ionization energy is used as a mean to reconstruct the incoming neutrino energy. In this case, the calorimetric energy is defined as where E i is the energy of the lepton or pions or the kinetic energy of the protons, and is the average nucleon separation energy. In the CLAS data, E cal was calculated for events that contained exactly one proton and Since neutrons do not contribute to the calorimetric energy, we expect this observable to be sensitive to the modeling of the intranuclear cascade. The peak of these distributions correspond to the beam energy and is dominated by the quasielastic contribution. The tail towards lower energies is due to the intranuclear cascade, a result of the proton interacting with other nucleons as it escapes the nucleus, as well as non-quasielastic interactions, which are not currently implemented in ACHILLES. Around the peak, the largest difference in peak height due to distinct implementations of the INC is about 7%. To be conservative, we will quote the INC theory uncertainty as the largest difference among all INC implementations. To further study the accuracy of event simulation, the CLAS and e4v collaborations studied three different transverse momentum related observables. The first observable is the transverse momentum defined as where p T e,p is the transverse vector momentum with respect to the beam axis for the electron and proton, respectively. Note that a lepton scattering off protons at rest would only lead to p T ≡ |p T | = 0. Fermi motion of nucleons in the nucleus will lead to a distribution of p T around 100-200 MeV, while the intranuclear cascade can introduce a long tail towards large values of p T . This observable is compared in Fig. 6 to the e4v data. The cascade tends to broaden the spectrum in the quasielastic region, increasing the maximum value observed. The spread due to different INC implementations is about 6%. The p T > 0.2 GeV region has significant contributions from non-quasielastic processes. To isolate contributions from different nuclear processes, a cut is applied in the p T variable before constructing the E cal distributions. The results are shown in Fig. 7 for a cut of p T < 200 MeV (top panel), 200 MeV < p T < 400 MeV (middle panel), and p T > 400 MeV (bottom panel). Again, intranuclear cascades affect the low E cal tail significantly, together with non-quasielastic interactions. This is most evident in the p T > 400 MeV plot, in which both effects are expected to have large impact. We find a theory uncertainty associated to the INC implementation of 5% to 6% near the peak of all distributions.
The other two observables we use to validate ACHILLES are 1 Note that p T = −q T , so δα T is the angle between the overall transverse momentum and the transverse momentum transfer. Our results for δα T are shown in Fig. 8.
In the limit of no final state interactions, p T is simply the initial proton transverse momentum. Since the initial proton momentum is isotropic, δα T should also be isotropic in this limit. The increase in the high-angle region of the δα T distribution can be attributed to intranuclear cascades and non-quasielastic interactions. We find an INC theory uncertainty in δα T of about 10%. On the other hand, δφ T measures the opening angle between the proton and the transverse momentum transfer. We present the comparison to data in Fig. 9. In the absence of both final state interactions and initial proton momentum, we have p T p = −p T e and thus δφ T is a delta function at zero. In the presence of Fermi momentum, the struck proton has a nonzero momentum, k p = 0, which smears the δφ T distribution around zero by O(k T p /p T e ). Final state interactions help to smear out the distribution to larger opening angles, partially explaining the high δφ T tails in Fig. 9. The INC uncertainty is found to be about 5%.
Finally, notice that observables for which final-state interactions play an important role offer the greatest sensitivity to the implementation of the INC model. This  sensitivity is visible in the spread in the color histograms and is particularly evident, for example, in the E cal distribution for higher p T (see the bottom panel of Fig. 7) and in the high angle region of the α T distribution (see top and middle panels in Fig. 8).

VI. OTHER OBSERVABLES
In this section we propose additional key observables that could be measured in current and future electron-  [69]. The definition of δα T can be found in Eq. 34.
nucleus scattering experiments, such as CLAS12 [10] or LDMX [112]. The goal is to encourage the experimental collaborations to present such observables that will ultimately serve to validate lepton-nucleus interaction models.
Let us start with an exclusive differential observable that is highly sensitive to final state interactions: the proton multiplicity energy spectrum. As we currently do not have pions propagating in our intranuclear cascade modeling in ACHILLES, we focus on np0π events. Taking the 2.257 GeV electron beam as an example, for every event, we count the number of protons that pass experimental cuts (see Sec. V). Then we take all leading-energy protons in events with at least one proton and build their energy spectrum. We repeat the procedure for all secondand third-leading protons, in events with at least two or three protons, respectively. The results of this procedure are the proton energy spectra shown in Fig. 10, from the leading proton in the upper panel to the third leading proton in the lower panel. We would expect this distribution to be highly sensitive to the intranuclear cascade model. INCs may raise the proton multiplicity, contributing to the spectra of second and third protons, and tend to distribute the energy among all outgoing protons, shifting the leading proton spectrum towards lower energies. This is indeed observed when comparing the INC uncertainties in peak regions in the three panels of Fig. 10, which are approximately 3%, 12%, and 15%, from top to bottom. We also expect that other cross section channels will significantly contribute to this observable in a nontrivial way. For example, while DIS occurs for higher momentum transfer, hadronization followed by final state interactions may lead to a large multiplicity of low energy protons. The proton multiplicity energy spectra for all interaction channels will be a subject of a future publication.
In the same vein, we also propose the proton multiplicity angular spectra. We take again the 2.257 GeV electron beam as an example. We plot the angle of the leading, second and third protons with respect to the beam axis in Fig. 11, from the leading proton in the upper panel to the third leading proton in the lower panel. Note that we have decided to order the protons according to their energies. Our main motivation lies on the fact that higher energy protons are more relevant to the reconstruction of neutrino energies, and therefore a correct description of the leading protons is more relevant than the subleading ones. Again, we expect intranuclear cascade models to play a crucial role here, as well as the other interactions channels, which will be the studied in a future publication.
Another interesting observable is the angle between the sum of the momenta of all visible outgoing particles with respect to the beam axis. The only particle we take to be invisible here are neutrons. We apply the 1p0π selection cuts from the CLAS/e4v analysis, see Sec. V. This angle would be zero in the case of an electron scattering on a free proton at rest. This observable is motivated the physics of atmospheric neutrinos. In this sample, the incoming neutrino direction needs to be reconstructed in order to estimate the neutrino path through the Earth and in the oscillation probabilities. A measurement of atmospheric neutrinos in the 0.1 − 1 GeV scale at the DUNE experiment could provide nontrivial information on the CP violation phase [113], and could also be used to perform a tomography study of the Earth, contributing to our understanding of the chemical composition of its core [114].
To be more precise, we define the reconstructed beam angle in electron-nucleus scattering as cos θ rec ≡k e · p out |p out | , where p out is the sum of all momenta of visible outgoing particles andk e is a unit vector in the beam direction.  intranuclear cascade, as a proton may scatter off a neutron which in turn may be invisible to most detectors of interest; and nuclear potential, which may deflect the outgoing proton. Our results are found in Fig. 12

VII. CONCLUSIONS
We have presented a newly developed lepton-nucleus event generator, ACHILLES. Our generator factorizes the primary interaction vertex from the propagation of hadrons througout the nucleus, allowing for a great deal of modularity, which is one of the pillars of ACHILLES. Due to this modularity, ACHILLES can be used for generating either electron-nucleus or neutrino-nucleus scattering events, and the implementation of numerous scenarios for physics beyond the Standard Model is straightforward.
We have validated quasielastic scattering against high quality, inclusive and exclusive, electron-carbon data, including the recent CLAS/e4v reanalysis of existing data. We find good agreement between data and simulation. A complete estimate of the theoretical uncertainty associated with the nuclear model and the current operator adopted in the description of the primary interaction vertex is highly non-trivial and has not been included in this work. A promising avenue to quantify model dependence involves testing different nuclear many-body methods, possibly including different nuclear currents, form factors, and Hamiltonians as inputs. A study along these lines has been carried out in Ref. [87] where the inclusive differential cross sections for electron scattering on 3 He and 3 H have been evaluated using different many-body approaches based on the same description of nuclear dynamics. Inputs from LQCD calculations, such as nucleon form factors and elementary nucleon matrix elements, will be incorporated as they become available.
By varying model assumptions of the intranuclear cascade (namely, different nucleon-nucleon interactions models and nuclear potentials), we have estimated one component of the overall theory uncertainty budget in electron-nucleus scattering. For observables that are sen-sitive to final-state interactions, the theoretical model dependence associated with different intranuclear cascade models is typically a 5-10% effect. Theory uncertainty estimates will be crucial for a precision neutrino physics program, in particular for the DUNE experiment.
We have also proposed novel observables that will allow for further validation of lepton-nucleus scattering models. Although we have only analyzed electron-carbon scattering data in the quasielastic region, our code is readily extendable to generate neutrino-nucleus scattering events. Comparison against neutrino scattering data, as well as the inclusion of other primary interaction modes, such as resonant scattering, meson exchange current and deep inelastic scattering, will be subjects of future publications.  perpendicular to the radius starting at a radius of r = 1 fm in the non-relativisitc Wiringa potential (blue) and the relativistic Cooper potential (red). The simulation is run for 100,000 time steps, and the maximum energy deviation is of the order of 10 −4 . The deviation is periodic in nature, which is a common feature for symplectic integrators.