Improved constraints on neutrino mixing from the T2K experiment with $\mathbf{3.13\times10^{21}}$ protons on target

The T2K experiment reports updated measurements of neutrino and antineutrino oscillations using both appearance and disappearance channels. This result comes from an exposure of $14.9~(16.4) \times 10^{20}$ protons on target in neutrino (antineutrino) mode. Significant improvements have been made to the neutrino interaction model and far detector reconstruction. An extensive set of simulated data studies have also been performed to quantify the effect interaction model uncertainties have on the T2K oscillation parameter sensitivity. T2K performs multiple oscillation analyses that present both frequentist and Bayesian intervals for the PMNS parameters. For fits including a constraint on \ssqthonethree from reactor data and assuming normal mass ordering T2K measures $\sin^2\theta_{23} = 0.53^{+0.03}_{-0.04}$ and $\Delta{}m^2_{32} = (2.45 \pm 0.07) \times 10^{-3}$ eV$^{2}$c$^{-4}$. The Bayesian analyses show a weak preference for normal mass ordering (89% posterior probability) and the upper $\sin^2\theta_{23}$ octant (80% posterior probability), with a uniform prior probability assumed in both cases. The T2K data exclude CP conservation in neutrino oscillations at the $2\sigma$ level.

The T2K experiment reports updated measurements of neutrino and antineutrino oscillations using both appearance and disappearance channels. This result comes from an exposure of 14.9 (16.4) × 10 20 protons on target in neutrino (antineutrino) mode. Significant improvements have been made to the neutrino interaction model and far detector reconstruction. An extensive set of simulated data studies have also been performed to quantify the effect interaction model uncertainties have on the T2K oscillation parameter sensitivity. T2K performs multiple oscillation analyses that present both frequentist and Bayesian intervals for the PMNS parameters. For fits including a constraint on sin 2 θ13 from reactor data and assuming normal mass ordering T2K measures sin 2 θ23 = 0.53 +0.03 −0.04 and ∆m 2 32 = (2.45 ± 0.07) × 10 −3 eV 2 c −4 . The Bayesian analyses show a weak preference for normal mass ordering (89% posterior probability) and the upper sin 2 θ23 octant (80% posterior probability), with a uniform prior probability assumed in both cases. The T2K data exclude CP conservation in neutrino oscillations at the 2σ level.

I. INTRODUCTION
The fact that neutrino flavor mixing [1] and oscillations [2] account for the apparent depletion of neutrino fluxes from natural sources is now well-established by detailed observations of these sources [3][4][5], and verified by experiments using monitored artificial sources [6][7][8]. Neutrino mixing requires that at least two of the neutrino masses (m 1 , m 2 and m 3 ) be non-zero which, in turn requires expanding upon the Standard Model. Masses require either new gauge singlets-right handed neutrinos-or a different mass generation mechanism from other Standard Model fermions, or a combination of both. The observed pattern of neutrino masses and mixing is therefore of great interest as a window onto physics beyond the Standard Model.

Unanswered questions in neutrino oscillations
The generally accepted explanation of leptonic mixing and neutrino oscillation phenomena centers around the 3×3 PMNS mixing matrix, named for Pontecorvo, Maki, Nakagawa and Sakata, which describes the neutrino mass eigenstates (ν 1 , ν 2 and ν 3 ) in terms of the weak flavor eigenstates (ν e , ν µ , and ν τ ). Under the assumption that neutrinos are Dirac particles, the matrix is conventionally parameterized as the product of three Tait-Bryan rotations (by θ ij ) and a phase transformation (by δ CP ), (1) It is well-established that all of these elements must be large, with the smallest |U e3 | 2 ∼ 1/45 and the majority of elements having magnitude-squared of at least 1/4. The top row elements are well-constrained by measurements of ν e disappearance [3,8,[10][11][12]. The U µ3 element is likewise constrained by disappearance of ν µ [4,7,13,14], but the dependence is of the approximate form |U µ3 | 2 (1 − |U µ3 | 2 ), leading to a degeneracy, commonly expressed in terms of the octant of the mixing angle θ 23 .
Since the matrix may be complex, a wide range of values for the the magnitudes of the four elements U µ1 ,U µ2 ,U τ 1 and U τ 2 are possible, depending on the phase parameter δ CP . A purely real matrix corresponds to δ CP being an integer multiple of π; any other value is manifested as violation of Charge-Parity (CP) symmetry in any neutrino appearance channel, via the Kobayashi-Maskawa mechanism [15]. The discovery of CP violation in the lepton sector is of great interest and is a major focus of current experiments [13,16]. The fact that CP violation is controlled by a single parameter means that the rate of ν e appearance is not independent of ν e appearance, so studying both channels provides a test of the standard PMNS picture.
Another approximate degeneracy is in the ordering of neutrino masses. It is known that ∆m 2 21 = m 2 2 − m 2 1 > 0, and that ∆m 2 31 > ∆m 2 21 , but the sign of ∆m 2 31 is as yet unknown. Neutrino masses (and the corresponding eigenstates) are numbered in order of decreasing ν e content. In the case where m 2 3 > m 2 1 , the predominant partner of the lightest charged lepton is the lightest neutrino. As the analogous pattern is seen in the quark sector, it is known as Normal Ordering (NO), with Inverted Ordering (IO) corresponding to a light ν 3 . This Mass Ordering (MO) degeneracy is partially lifted at higher neutrino energies by the interactions of neutrinos with matter [17,18]. This matter effect changes both the propagation eigenstates ('effective masses') of the neutrinos and their mixing with the flavor states.
These three remaining unknowns (octant, δ CP , MO) are all accessible to current-generation long-baseline neutrino experiments such as T2K, through the ν µ → ν e appearance channel and its CP conjugate ν µ → ν e . Although CP violation has the most obvious significance, the general structure of the matrix may give us a window into the deeper problem of neutrino mass and a broad range of new physics. An inverted ordering would imply the lightest neutrino is relatively weakly coupled to the lightest charged lepton. Similar in character would be resolving the octant degeneracy in favor of the upper octant, as this implies ν 3 is not the predominant partner of the heaviest charged lepton. More generally, the highly non-diagonal structure of the PMNS matrix is suggestive of an origin for neutrino masses that is separate from the electroweak physics that dominates the masses of the heavier fermions, and precision measurements of the ( ν ) µ → ( ν ) e channel can help to determine the remaining elements.

II. THE T2K EXPERIMENT
T2K is a second-generation accelerator neutrino oscillation experiment [19], utilizing a narrow-band beam and a 295 km baseline from J-PARC in Tōkai, Ibaraki to Super-Kamiokande (commonly Super-K or just SK) in Hida, Gifu. The primary proton beam is accelerated to 30 GeV by J-PARC's Main Ring. In each cycle eight bunches are extracted in a single turn and directed due west at a downward angle of −3.6 • . The intensity of the extracted beam is monitored by five current transformers that are also used to normalize the neutrino exposure between the various detectors of the experiment, while the beam profile is monitored with secondary emission monitors-most of which are removed during physics runs-and an optical transition radiation monitor. The beam power has increased over time, reaching 500 kW by the end of May 2018.
The protons impinge on a 91.4 cm graphite target to produce a secondary beam, primarily composed of pions and kaons that are focused (or defocused, according to charge) by a set of three magnetic horns [20] pulsed with a peak current of 250 kA. The focused secondary beam propagates in a 96 m Helium-filled decay volume where the secondaries can decay to produce neutrinos, among other particles. A beam dump sits at the end of the decay volume, followed by a muon monitor (MUMON) which is used to check the stability of the secondary beam.
If the horn current is run in the 'Forward' direction, positively-charged secondaries are focused, which decay to produce a beam that is primarily ν µ . This is referred to as 'Forward Horn Current' (FHC) or neutrino mode. Alternatively the horn current can be reversed, to give a beam of primarily ν µ , which is referred to as 'Reversed Horn Current' (RHC) or antineutrino mode. In either case, secondary hadrons produced in the very forward direction pass through the field-free necks of the horns and contribute to a 'wrong-sign' flux that is of order 1% of the intended 'right-sign' component in FHC mode, and order 10% in RHC mode.
The horn configuration is most effective at focusing pions with momenta between 2 and 2.5 GeV/c, resulting in a broadband neutrino flux along the beamline axis that peaks at around 1 GeV. However, the beam is directed so that its center passes roughly 4 km south of and 12 km below the Super-Kamiokande detector, equivalent to an angle 2.5 • from the beam axis, as measured from the target. This results in a narrow-band flux at the detector that peaks more strongly at 0.6 GeV-roughly the energy of the first oscillation maximum for this baseline. This dramatically reduces the flux of higher-energy neutrinos that can mimic ν e signal events through neutral current interactions.
The experiment utilizes a suite of Near Detectors (NDs) at a site 280 m downstream of the target to characterize the initial flux of neutrinos and their interactions. A high-mass monitoring detector, INGRID, is centered on the nominal beam axis and samples the beam out to about 1 • around the beam axis, as measured from the target. This allows the intensity and direction of the neutrino beam to be monitored on a daily basis, the latter to a precision of a few centimeters.
The second near detector, ND280, is optimized for measuring interaction rates and the properties of neutrino interactions. The detector is positioned off-axis at the same azimuthal angle as Super-Kamiokande to minimize the effect of any directional shift of the beam, and at a distance from the beam axis to make the neutrino spectrum as similar as possible to what would be seen in SK in the absence of oscillations. Residual differences in spectra exist because the decay volume covers a significant fraction of the 280 m baseline; these are automatically accounted for in the way the ND280 data is used in the analysis. The ND280 off-axis detector provides the most relevant information on the neutrino flux and interactions, but more importantly allows the analysis to form a multi-dimensional constraint, incorporating correlations between flux and interaction uncertainties. The ND280 additionally has a 0.2 T magnetic field to provide charge identification of detected leptons and therefore separately constrain the neutrino and antineutrino components of the beam, improving sensitivity to CP violation. This is particularly important for the RHC data set which has a larger (∼ 30%) background of 'wrongsign' events.
Neutrinos that interact in SK produce signals that can be analyzed in one of five different event categories. At T2K's beam energy, the lepton produced in a chargedcurrent interaction will be either µ ± or e ± , so events are categorized as muon-or electron-like based on the pattern of Cherenkov light in the detector. SK has no magnetic field, so the sign of the horn current is used to categorize events as a proxy for neutrino/antineutrino identification, and the kinematics give some discrimination between signal events and backgrounds. Events without a lepton-like Cherenkov ring and events where there is evidence of additional hadronic activity are not used, except for a fifth class of events which selects FHC ν e -like events with evidence of a low energy positron from an ejected π + . These selections will be discussed in greater detail in Sec. IX B.
Because the atmospheric term is proportional to |U µ3 | 2 |U e3 | 2 , the ν e appearance probability is free of the octant degeneracy seen in the ν µ disappearance probability. In practice the ν µ channel remains an important part of the fit, because there is a larger number of events, which gives higher precision on ∆m 2 3i and sin θ 23 . The disappearance channel is also important because it is relatively insensitive to sin θ 13 , δ CP , and mass ordering, which helps to reduce the impact of degeneracies in the appearance channel.
The amplitude of the interference term is about 20% of the atmospheric term, making it possible to measure the phase δ CP . This interference term is of particular physical importance as the relative sign of δ CP and ∆ ji changes between neutrinos and antineutrinos, leading to CP violation if δ CP is not an integer multiple of π. Since T2K's event spectrum peaks very close to the oscillation maximum at ∆ 32 ∼ π/2, the contribution from the interference term is a direct measure of the amount of CP violation in the neutrino sector.
The interference term also depends directly on sign(∆m 2 31 ), through the sin ∆ 31 part. Up to a small difference between ∆m 2 31 and ∆m 2 32 , this is degenerate with a substitution δ CP → π − δ CP . This means the same data will prefer opposite signs of (δ CP − π/2) for normal and inverted orderings. This degeneracy is lifted if the mass ordering is assumed, or if one or other ordering is strongly preferred in the fit.

A. Matter effects
The (approximately constant) density of electrons in the Earth's crust along the T2K baseline changes [22] the propagation Hamiltonian (c.f. Eq. (3)): where n e is the number density of electrons, G F is the Fermi coupling constant, and the minus sign applies for antineutrinos. The impact can be understood using T2K's beam energy of ∼ 0.6 GeV, and a crust density [23] of 2.6 g cm −3 , in which case: Because of the size ordering of terms in the Hamiltonian, the matter effect dominates the solar term, but is only a small perturbation compared to the atmospheric and interference terms. In practice, because the resulting oscillation probability is dominated by the atmospheric term, T2K's sensitivity to the mass ordering comes mostly from this perturbative effect on |U e3 | and ∆m 2 31 . In this regime the matter effect can be described with a dimensionless parameter which is positive for neutrinos in the normal ordering and for antineutrinos in the inverted ordering. The resulting probability for neutrinos or antineutrinos is where the CP-violating sin δ CP term takes a negative sign for neutrinos and positive sign for antineutrinos. For T2K, the largest observable effect from propagation in matter is the [1−ξ] −2 scaling of the atmospheric term. This modification to the leading atmospheric term is about 5%, leading to a roughly 10% difference in the appearance probability for neutrino and antineutrinos. Since this difference is about half the amplitude of the CP-violating term, it is in general difficult to disentangle the two phenomena if they have opposite effects on the total number of events observed, or if the value of sin δ CP is close to zero. However if both phenomena enhance (or both suppress) the total number of events, then the net effect can be too large to attributable to either source alone, and there will be much less ambiguity.

B. The survival probability
In the same notation, the ( ν ) µ survival probability at T2K is to a good approximation given by: where ∆ Atm = ∆ 32 + ∆ 21 × T µµ 1 /(1 − T µµ 3 ). So although the observable survival probability is not sensitive to the mass ordering, the best-fit value of ∆ 32 is different for normal and inverted orderings. As for the oscillation amplitude, in terms of the standard parameterization |T µµ 3 | = sin 2 θ 23 cos 2 θ 13 , so the amplitude reaches a maximum value around sin 2 θ 23 = 1/(2 cos 2 θ 13 ), and values on either side of this are degenerate. Propagation in matter does not change the survival probability by much; matter dependent effects are suppressed by a factor of T ee 3 (1 − 2T µµ 3 ) [23]. Since the matter density can be approximated as symmetric, the probability also has no dependence on sin δ CP [24]. However the relationship between ∆ 32 and the observable ∆ Atm does depend on cos δ CP through T µµ 1 , which can for some parameter combinations give rise to a correlation between the measured ∆ 32 and δ CP .

IV. UPDATES SINCE THE PREVIOUS RESULTS
This analysis uses a SK data set collected up to the end of May 2018. This corresponds to an exposure of 14.94 × 10 20 Protons on Target (POT) in FHC mode and 16.35 × 10 20 POT in RHC mode, the same as used to report indications of CP violation in [16]. A detailed breakdown is given in Tab. I. Compared to the previous update [25] this is a nominal increase of 1% in FHC mode, but 116% in RHC mode, which is particularly of interest for indications of ν e appearance, described in Sec. XIV.
In parallel with the statistical increase, our event selection has been refined since it was last described in detail [26]. Event reconstruction is now based on an algorithm that matches the pattern of light observed in SK directly [27]. This makes use of more information about the event, providing better discrimination between event categories, and improving the resolution of the lepton momentum and vertex location. As a result, the fiducial volume can also be expanded, roughly equivalent to a 20% increase in statistics for the ν e samples, as described in Sec. IX. The newer reconstruction algorithm has previously been used for rejecting neutral current π 0 events in the ν e samples [28,29] and for all aspects of event selection and reconstruction in more recent publications [16,25].
A large fraction of the analysis development focuses on the interaction model, which incorporates constraints from a number of new external data sets and theoretical improvements. Since reported in [26], the dominant charged-current quasi-elastic (CCQE) models have been updated in various respects, including: the handling of weak charge screening in nuclei; the handling of nucleon removal energy and its effect on lepton kinematics; and additional freedom allowed in the kinematic dependence of interactions involving correlated nucleon pairs (2p2h). Modeling of (and uncertainties assigned to) subdominant processes have also been improved, including coherent scattering and neutral current interactions.
Constraints on neutrino oscillations and the associated parameters come from a combined analysis of disappearance and appearance channels across FHC and RHC configurations, using the same approach as in [16]. This paper provides a fuller description of the method and a broader range of results.

V. ANALYSIS OVERVIEW
The T2K near and far detectors have different target nuclei and are based on different particle detection techniques. The T2K oscillation analysis therefore uses parameterized models of the neutrino beam flux and the neutrino interaction cross section to propagate near detector information to predict the far detector event rate.
The neutrino flux prediction has been described in detail in Ref. [30]. The collision of 30 GeV protons from the J-PARC main ring with the T2K neutrino production target is simulated using FLUKA [31][32][33]. The resultant secondary particles are passed to a GEANT3 [34] simulation of the magnetic focusing horns and decay volume downstream of the target. GCALOR [35,36] is used to model hadronic interactions of the secondary particles as they traverse the focusing horns and decay volume. Particles are then allowed to decay to produce neutrinos. Data from proton beam monitors is used to tune the initial proton beam parameters in the simulation. NA61/SHINE, a fixed-target experiment at CERN's Super Proton Synchrotron, measures particle production in nucleus and hadron collisions with a large acceptance spectrometer. This includes measurements of the collisions of 30 GeV protons with graphite. Data from the NA61/SHINE [37][38][39][40] experiment are then used to tune the secondary particles produced from the target. Finally, the INGRID [41] on-axis near detector is used to monitor the neutrino beam direction. The uncertainty from each of these measurements are combined with uncertainties from the beam simulation to give the final flux uncertainty. This is parameterized as a function of neutrino energy, neutrino species and whether the beam is operating in FHC or RHC mode. Figure 1 shows the predicted neutrino fluxes at SK for both the FHC and RHC modes. Previous T2K flux estimates [30] used thin target hadron production data collected in the NA61/SHINE experiment in 2007 [37][38][39], where a 31 GeV proton beam impinged upon a graphite target with a thickness of 4% of a nuclear interaction length (the so-called thin target). The work presented here uses an updated flux prediction based on higher statistics thin target hadron production data collected in the NA61/SHINE experiment in 2009 [40], including the yields of π + , π − , K + , K − , K 0 s , Λ and p. Future analyses will include NA61/SHINE hadron production measurements on a replica of the T2K neutrino production target. Figure 2 shows the fractional uncertainty on the ν µ flux at SK in FHC mode, on the wrong-sign ν µ flux at SK in FHC mode and on the right-sign ν µ flux at SK in RHC mode. The improvement obtained by including the 2009 NA61/SHINE thin target data is also indicated. The T2K neutrino interaction model is described extensively in Section VI. The model incorporates a number of tunable parameters whose prior uncertainties and nominal values come from comparisons to electron and neutrino scattering data sets. There are a number of models that agree with existing data equally well, and it is not always possible to parameterize the differences between these models. In this case simulated data studies have been performed using these alternate models, as described in Section XII.
The neutrino flux and interaction models are fit to data collected by the T2K off-axis near detector, ND280 [19]. The fit varies the parameters within both models simul-  taneously to best match the ND280 data, as described in Section X. This results in tuned flux and interaction models with correlated uncertainties, providing a more accurate and precise prediction of the event rate at Super-Kamiokande.
At the far detector a simultaneous fit of muon-like and electron-like samples from both neutrino and antineutrino beams is used to constrain the PMNS oscillation parameters. The conventional {θ ij , δ CP } parameterization is used, enforcing unitarity, and the effect of propagation in matter is included. Data from ν e and ν e disappearance experiments are used to constrain the parameters (θ 12 and ∆m 2 21 ) [42] that T2K has little sensitivity to. Fits are performed both with and without an external constraint on θ 13 [43]. Systematic uncertainties are treated by a numerical marginalization technique: all parameterized uncertainties are randomly sampled many times according to prior constraints, including ND280 data, and the likelihoods averaged over the ensemble. This process is described in detail in Section XI, with the results described in Section XIII.

VI. NEUTRINO INTERACTION MODELING
Oscillation parameter values are inferred from spectra of observable quantities, herein either reconstructed charged-lepton kinematics, (p , θ ), or reconstructed neutrino energy. The reconstructed neutrino energy is estimated from final-state charged-lepton kinematics only as where M N,i , M N,f , and M are the mass of the initialstate nucleon (an effective, off-shell mass that includes the 'nucleon removal energy' is often used), final-state nucleon, and final-state charged lepton respectively; E , p , and θ are the energy, three-momentum, and angle of the final-state charged-lepton respectively. E Rec QE provides a smeared but minimally-biased estimate of the neutrino energy for quasi-elastic neutrino scattering off bound nucleons (CCQE). For other interaction channels, such as those that produce extra hadrons, E Rec QE underestimates the energy of the interacting neutrino.
The procedure of inferring oscillation parameter values from observable quantities implicitly relies on an accurate understanding of the rate of background processes and a mapping between true energy and observable quantities, e.g. E Rec QE (E ν ), both of which are derived from simulation. As a result, accurate neutrino interaction modeling is critical. Event selections are trained on the simulated distribution of final-state particles and the predicted rate of various signal and background processes. The predicted rate of a number of neutrino interaction processes, which exhibit different true-to-observable mappings, is constrained by near detector data and then used to interpret the observed far detector data. This section briefly describes the neutrino interaction model, accounted-for freedom within the model, and specific studies used to test resilience to known weaknesses of the model.

A. The Base Interaction Model
The samples of simulated neutrino interactions used in this analysis were made with version 5.3.3 of the neut interaction generator [44]. neut simulates known neutrino interaction channels relevant for few GeV neutrinos, these channels are broadly categorized as: 1p1h, 2p2h, single pion production, and deep inelastic scattering (DIS). In addition to the 'primary' interaction channels, the effect of using nuclear targets, where the struck nucleons are bound within a nuclear potential, needs to be modeled well. These effects can be separated into initial-state and final-state effects. Most updates to the interaction model since the previous analysis [26], are in the treatment of systematic uncertainties; however, a short description of the whole model is included here for completeness. As the 'base' model has not changed, the interested reader is directed to Ref. [26] and Ref. [45] for a discussion of the motivations behind any specific model choices.
a. Initial-state nuclear effects: Nucleons bound within a nuclear potential undergo non-negligible 'Fermi motion'. For carbon, this means bound nucleons have a momentum of p f < ∼ 217 MeV/c, or equivalently, a Fermi energy of E f < ∼ 25 MeV. A Global Relativistic Fermi Gas (GRFG) is used to model the initial-state nucleon momentum distribution in this analysis. Neutrino interactions with bound nucleons are largely handled under the impulse approximation, whereby a single 'struck' nucleon receives a four-momentum kick while the rest of the target nucleus acts as a group of non-interacting 'spectator' nucleons. This rudimentary nuclear model is a simple approximation for the correct modeling of the initial nucleon momentum distribution and nucleon removal energy, a study, presented below, accounts for the effect of this approximation.
b. 1p1h: One particle, one hole interactions are those where the neutrino interacts quasi-elastically with a single bound nucleon-the interaction is only quasielastic because of the bound nature of the target nucleon and, for charged current events (CCQE), the initial-tofinal-state charged-lepton and nucleon rest mass difference. Such interactions are modeled in the Lewellyn-Smith formalism [46], using the BBBA05 [47] description for the vector part of the nucleon form factors, and a simple dipole form for the axial part. The neut model includes two additional features of note: the nucleon removal energy, 'NRE', and in-medium modifications to the W boson propagator via the Random Phase Approximation ('RPA'). Variations in the average nucleon removal energy modify the predicted kinematics of finalstate particles, most importantly charged leptons. When comparing predictions based on Fermi Gas nuclear models to 1p1h-like cross-section data, a suppression at low four-momentum transfer is favored relative to the freenucleon-target calculation [48]. This is often attributed to a weak-charge screening effect as a result of the nuclear medium [49]. The effect is termed 'RPA' after the 'Ran-dom Phase Approximation' technique used to sum up the series of contributing W-boson self-energy diagrams. Here, the distribution of four-momentum transfer is modified by the RPA calculation from Nieves et. al. [49]. As can be seen in Fig. 3, 1p1h is the dominant interaction channel at T2K energies.
The total charged-current cross section for muon neutrinos interacting with a carbon nucleus, as predicted by neut, overlaid on the ND280 muon neutrino flux, and an example oscillated muon neutrino flux at SK. The oscillation parameters used here are the best fit from the previous analysis [26]. The total (Inc) cross section is separated into 1p1h, 2p2h, single pion production (SPP), and deep inelastic scattering (DIS) channels.
c. 2p2h: Two particle, two hole interactions are an inherently nuclear-target process, whereby the incoming neutrino interacts with a bound pair of nucleons, knocking both out of the nuclear potential. The Nieves et. al. model [50] is used to predict the cross-section as a function of lepton kinematics. While this process is subdominant, it produces observable final states that are indistinguishable from 1p1h interactions in the T2K detectors, but with different observed lepton kinematics as a function of neutrino energy. In the Nieves et. al. 2p2h model, there are two distinct regions of strength in the energy and momentum transfer space: the quasi-elasticlike (energy transfer, q 0 < ∼ 0.3), and Delta-like regions (q 0 > ∼ 0.3). The energy-momentum transfer distribution and the corresponding E Rec QE biases can be seen in Fig. 4. d. Single pion production: Single pion production can be separated into three sub-processes: resonant, nonresonant, and coherent single pion production. The resonant and non-resonant processes describe the production of a pion involving neutrino scattering of a single nu- The two peak structure is clear, with QE-like kinematics corresponding to the lower left peak, and Delta-like kinematics to the stronger central peak. Bottom: The reconstructed energy bias at SK is shown for 1p1h and 2p2h events for an oscillated muon neutrino flux. The different reconstructed energy smearing for 2p2h events with QE-like and Delta-like interaction kinematics can be seen.
cleon, either via an intermediate baryon resonance (resonant), or not (non-resonant). These processes are modeled in the Rein-Sehgal formalism [51], with an improvement that includes the effect of the final-state chargedlepton mass [52], and updated nucleon axial form factors from Graczyk & Sobczyk [53]. The contributions from 17 baryon resonances are considered, with the ∆(1232) being dominant, and interference terms between the resonances are taken into account. The non-resonant channel augments the production of half-unit isospin final states (e.g. ν + n → − + p + π 0 and ν + n → − + nπ + ); any interference between the resonant and non-resonant contributions is ignored. These processes are used to model final states with an invariant hadronic mass of W ≤ 2.0 GeV/c 2 . The modeling of the so-called 'transition region' between single pion production off a nucleon and shallow-and deep-inelastic scattering is an unsolved theoretical problem [54]. In the neut model, the region 1.3 ≤ W ≤ 2.0 GeV/c 2 contains contributions from both the Rein-Sehgal single pion model, described above, and the deep-inelastic-scattering model, described below. For higher invariant masses the deep inelastic scattering model is used. The axial form factors and the strength of the nonresonant contribution in the Rein-Sehgal model were tuned to published cross-section data using the NUI-SANCE framework [55]. As these parameters control only the nucleon-level interaction, the central values were determined from fits to deuterium-target bubble chamber data, which is largely free from nuclear effects. Data from ANL [56] (with some reanalyzed distributions taken from Ref. [57]) and BNL [58] was used. The uncertainties determined from the fits to bubble chamber data were then inflated to approximately cover cross-section data from MiniBooNE [59] and MINERνA [60,61].
Coherent single pion production describes the interaction of a neutrino coherently with a whole nucleus. This is a sub-dominant pion production process, observed at low energy for the first time by the MINERνA experiment [62] and is characterized by very little four momentum transfer to the struck nucleus. We follow the preference of the MINERνA data and use the Bergher-Sehgal model [63].
e. Deep inelastic scattering: For interactions producing hadronic systems with two or more pions and invariant hadronic masses of W > 1.3 GeV/c 2 , the cross section is constructed from nucleon structure functions that depend on the Bjorken scaling variables x and y. The structure functions are calculated from the GRV98 [64] parton distribution functions, with modifications from Bodek et. al. [65] to account for the relatively low momentum transfers involved. For interactions with 1.3 < W ≤ 2.0 GeV/c 2 the hadronic state is generated by a custom multi-pion production model, above W = 2.0 GeV/c 2 PYTHIA 5.72 is used [66].
f. Final-state nuclear effects: After the primary neutrino interaction has been simulated, a number of additional 'nuclear effects' are included. For interactions that produce a final-state proton or neutron, the Pauli exclusion principle is applied, rejecting any events that produce a nucleon below the Fermi energy. This results in a suppression at low four-momentum transfer for 1p1h events. Final-state hadrons produced at the neutrino interaction vertex are stepped through the nuclear medium in a classical cascade, in which they may: interact and produce secondary particles, be absorbed, or undergo charge exchange (e.g. π + + n → π 0 + p). Such re-interactions are often called 'Final-State Interactions', or FSIs. Finally, after the primary interaction and hadronic cascade, the remnant nucleus can be left in an excited state that will subsequently decay. For interactions on oxygen, nuclear de-excitations that result in secondary, low energy photons (O (1 − 10) MeV) are modeled following Ref. [67].
With the exception of 2p2h interactions, these channels and effects are also implemented for neutral current interactions, but the details are not repeated here for brevity. The total charged-current cross-sections, broken down by interaction channel, are shown in Fig. 3.

B. The Uncertainty Model
As the number of observed events included in the analysis grows with exposure, a robust interaction uncertainty model is required to assess the significance of the results. The uncertainty model for 1p1h and 2p2h interactions have seen recent improvements and will be discussed in detail here. For details on other, unchanged sources of interaction uncertainty see Ref. [26] Section III.
a. 1p1h The neut 1p1h interaction model implements three main sources of uncertainty: the mass used in the dipole axial form factor (M QE A ), the effect of RPA on the cross section as a function of four-momentum transfer, and the Nucleon Removal Energy associated with scattering off a bound nucleons.
In this analysis, M QE A does not have a prior uncertainty and is constrained by near detector data alone. The parameterization of the uncertainty on the RPA suppression has been updated in this analysis; the previous implementation proved problematic because variations of different free parameters effected a similar response in Q 2 . For this analysis, Bernstein polynomials were used to model the shape below some Q 2 cutoff, U , above which an exponential decay form is used: where x = Q 2 and x = Q 2 /U. To ensure continuity at Q 2 = U , the conditions: are enforced, leaving four free parameters, A, B, C, D. The fifth, U , is kept fixed at 1.2 GeV 2 . This parameterization has been termed 'BeRPA' after the Bernstein polynomials on which it is based. The effect of varying each of the four free parameters relative to the theoretical uncertainty calculated by following Ref. [68] can be seen in Fig. 5. Together, the four free BeRPA parameters and the M QE A parameter give effective freedom over a range of Q 2 . The Q 2 distribution is then largely constrained by the fit to near detector samples. b. 2p2h The details of the 2p2h process are highly uncertain. The total cross-section, evolution with neutrino energy, and energy and momentum transfer characteristics of the process are all predicted differently by the available models (e.g. Nieves et. al. [50], Martini et. al. [69], SUSAv2-MEC [70], GiBUU [71]). While data from the T2K near detector [72][73][74], MINERνA [75,76], and NOνA [77] favor a process with similar interaction kinematics to such a multi-nucleon process 1 , experimental sensitivity to this exclusive channel is weak. As a result, significant freedom is afforded to the 2p2h process in this analysis.
The uncertainty on the 2p2h process is separated into normalization and shape components. An overall 100% normalization uncertainty is separately assigned to interactions involving neutrinos and those involving antineutrinos. An additional parameter that introduces freedom in the relative normalization of 2p2h interactions with carbon and oxygen nuclei is used. Finally, a parameter that varies the relative strength of the QE-like and Deltalike components, while keeping the total cross-section for 2p2h events constant, is introduced; the effect of extreme variations of this parameter are seen in Fig. 6.

C. Simulated data studies
It is strongly suspected that the described uncertainty model may not cover all differences between nature and the interaction model described above. To begin to address this, we perform fits of the model to targeted 'simulated data sets' that test the robustness of the model and associated uncertainties to known missing features. In some cases the results of these studies are used to motivate additional uncertainties. This section introduces the simulated data sets that were analyzed to address specific concerns, the results of the fits will be discussed in Section XII.
a. Alternative 1p1h Nuclear Models The GRFG used to model the nuclear initial state is a simple model that contains no correlations between initial momentum and Nucleon Removal Energy (nre). Such correlations may be important for correctly modeling the observed charged lepton spectrum [78] and are seen in nuclear response measurements from electron scattering experiments. To test the robustness of the implemented uncertainty model to such details, two simulated data sets are used: the Nieves et. al. 1p1h model, and the spectral function model of Benhar et. al. [78] (BSF). Both contain some correlation between the initial momentum and the nre. The Nieves et. al. model differs from the base model by implementing a local, rather than a global, Fermi Gas (LFG), in which the concept of a radially-dependent nuclear density profile introduces such correlations. In the BSF model, initial nucleons are chosen from a full, two dimensional nuclear response distribution, which is constructed from (e,e p) data [78]. It should be noted that the BSF model contains no 'RPA'-like, low fourmomentum-transfer suppression effect. In constructing the simulated data, only the 1p1h cross section is modified. The predicted final-state muon kinematics for each model are shown in Fig. 7.
b. Nucleon Removal Energy The implementation of the nre parameter was revised for this analysis. The previous implementation relied on calculating the change in the predicted differential cross-section for a variation nre → nre , σ CCQE (E ν , p , θ , nre); this proved problematic as variations of nre modify the available kinematic phase space for the production of final-state muons. For some simulated interaction, (E ν , p , θ ), and binding-energy variation, nre , the variation weight, w = σ CCQE (Eν,p ,θ ,nre ) /σ CCQE (Eν ,p ,θ ,nre), will be illdefined when the denominator is vanishingly small. Instead, an effective implementation was used that shifts the final-state charged-lepton momentum in response to variations of nre. The momentum shifts were calculated in bins of true neutrino energy and true final-state charged-lepton polar angle-in the neut implementation, variations of the binding energy effect only small changes in the final-state lepton angular distribution. An example of such a variation can be seen in Fig. 8. The prior uncertainty on the new nre parameterization was taken as 18 MeV-this large uncertainty is motivated in part because of implementation choices in neut [79] and in part because of uncertainties on the analyzed electronscattering measurements.
The uncertain nre parameter was not included as a free parameter in the near detector fit for this analysis. Instead, an extremal variation based on the results in Ref. [79] was included as a simulated data set.  As previously mentioned, the modeling of 2p2h interactions is highly uncertain. We include a simulated data set based on an alternate 2p2h calculation by Martini et. al. [69]. This calculation predicts a larger inclusive 2p2h cross section than the Nieves et. al. calculation-importantly increasing the relative neutrino/antineutrino 2p2h strength-and is thus an instructive alternate model. d. Kabirnezhad Single Pion Production The Rein-Sehgal model accounts for interference between pion production channels that include a baryon resonance, however, interference between resonance and non-resonance channels is neglected. A new model, developed by Kabirnezhad [80], overcomes this limitation and is used to build a simulated data set.
e. Data-driven CC 0π E ν − Q 2 dependence The untuned ND280 CC 0π sample prediction underestimates the data by approximately 5%. This sample is largely composed of 1p1h and 2p2h interactions but with a significant contribution from interactions that produce a pion which is then absorbed before leaving the nucleus. The 2p2h interaction can be further classified into events with and without a virtual ∆(1232) particle. Simulated data sets are created by assigning the observed CC 0π data- simulation discrepancy to either the 1p1h or 2p2h event categories. At the near detector the event category is weighted in bins of lepton momentum and angle so that the simulation matches the data. This weighting is then projected as a function of neutrino energy and Q 2 and applied to the far detector simulation to create the simulated far detector data. f. Coulomb Correction As the final-state charged lepton leaves the nuclear potential, it undergoes a small momentum shift because of interaction with the Coulomb field of the nucleus. In addition, the Coulomb po-tential results in a small variation in the relative neutrino/antineutrino cross section.
The effect of the Coulomb potential was not included in the base model, and thus a simulated data set was included in which a momentum shift was applied to final-state (anti-) muons and (anti-) electrons, following Ref. [81], and the relative charged-current cross section for neutrinos and antineutrinos was varied by 3%.

VII. NEAR DETECTOR DATA
The history of protons on target delivered to the T2K beamline until the end of May 2018 is shown in Fig. 9. Data for runs 1-9 in the muon monitors and the on- axis INGRID detector are shown in Fig. 10. The rate is stable throughout the run periods, and the horizontal and vertical beam positions are stable to less than 1 mrad throughout all of the run periods. The off-axis near detector, ND280, is located 280 m upstream of the beam source. It consists of several subdetectors inside a 0.2 T magnet. Charged current (CC) ν µ and ν µ neutrino interactions are selected in the tracker region, which is composed of two fine-grained detectors (FGD1 and FGD2) [82] interleaving three time projection chambers (TPC) [83]. The FGDs provide the target mass for neutrino interactions. The first FGD consists of 30 layers each composed of 192 plastic scintillator bars. The bars in each layer are oriented perpendicularly to the neutrino beam direction and to the bars in the preceding layer. Each pair of layers forms a single module providing a three-dimensional position for charged particles passing through it. The second FGD consists of alternating plastic scintillator modules and water panels. There are seven scintillator modules interspersed with six water panels, providing a water target for neutrinos to interact within. This allows effects relating to neutrino interactions on water to be isolated from those on carbon, reducing the uncertainty in extrapolating the event rate measurement from ND280 to SK. The TPCs measure both the curvature of charged particles in the magnetic field of ND280 and the energy lost by the particles as they travel through the TPC gas. The curvature of the particles provides a precise measurement of their momentum and charge, while the energy loss allows the particle species to be identified. Only ND280 data from runs 2-6 are used in this analysis, a smaller sample than for SK. Data quality is assessed weekly, and the total ND280 data taking efficiency across runs 2-6 was ∼ 93%. The near detector analysis uses a total exposure of 5.81×10 20 POT in FHC and 2.84×10 20 POT in RHC, as shown in Tab. I.
The event selection for FHC is unchanged since the analysis described in Ref. [26]. The highest momentum, negatively charged track in each event is selected as the lepton candidate. The candidate track must start within the fiducial volume of FGD1 or FGD2 and be identified as muon-like by the TPC. This produces a selection of charged-current (CC) ν µ interactions. The selected events are divided into three samples for each FGD, based on the reconstructed pion multiplicity. Positive pions are identified in three ways: a positive charged FGD-TPC track with energy loss consistent with a pion; a positively charged FGD-contained track with charge deposition consistent with a pion; or a delayed energy deposit in the FGD due to stopped π + → µ + → e + decays. Negatively charged, minimally ionising TPC tracks are identified as negatively charged pions. Neutral pions decay instantaneously to pairs of photons, which can then convert to electron-positron pairs. TPC tracks with charge depositions consistent with an electron are used to identify these decays.
The three FHC CC sub-samples are CC 0π, which is dominated by CCQE interactions, CC 1π + , which is dominated by CC resonant single pion production, and CC Other, which is dominated by interactions producing multiple pions. The reconstructed muon momentum and angle of the selected data and simulation events in the FHC CC 0π and CC 1π + samples are shown in Fig. 11 and Fig. 12, for both FGD1 and FGD2. The numbers of events recorded in each sample and the expectation prior to the ND280 fit are shown in Tab. II. The event selection for ν µ and ν µ interactions in the RHC beam mode is unchanged since the previous analysis [26]. These selections differ from the FHC selections described above in two important ways. As a larger number of interactions are produced by "wrong-sign" neutrinos, selections of both ν µ and ν µ interactions are used in the RHC beam. Taking into account differences in the flux and cross-section, the wrong-sign contamination is approximately 30% in the selected RHC samples compared to 4% in the FHC samples.
The selected ν µ (ν µ ) CC candidate events are divided into two samples for each FGD, based on the number of reconstructed tracks crossing a TPC. These are CC 1-track, which is dominated by CCQE-like interactions, and CC N -track, which is dominated by interactions producing pions. The events are not divided according to the number of observed pions, unlike the FHC selections, due to the lower interaction rate for antineutrinos. The reconstructed muon momentum and angle of the selected events in these samples for FGD1 are shown in Fig. 13 and Fig. 14 respectively.
In total there are 14 ND280 event samples: six for FHC (CC 0π, 1π + and Other, for FGD1 and FGD2), four for right-sign RHC (CC 1-track and CC N -Track, for FGD1 and FGD2) and four for wrong sign RHC (CC 1-Track and CC N -Track, for FGD1 and FGD2). The number of observed and predicted events for each sample are shown in Tab. II.

VIII. SUPER-KAMIOKANDE DATA AND SIMULATION
The Super-Kamiokande detector [84] consists of a cylindrical tank filled with 50 kilotonnes of pure water, located in the Mozumi mine in Hida, Gifu. An overburden of 2700 meter-water-equivalent provided by Mount Ikeno suppresses the cosmic ray muon flux by five orders of magnitude. Photo-multiplier tubes (PMTs) are supported by a 55 cm wide steel structure, placed 2 m away from the tank walls, which divides the detector into two optically separated regions. The outer detector (OD) region, used to identify events with entering particles, is lined with reflective material and viewed by 1885 8 PMTs. The inner detector (ID) region contains 32 kilotons of water and is instrumented with 11146 20 PMTs which make up 40% of the detector's inner surface. The high density of PMTs in the ID allows for the imaging of the ring-like light patterns projected on the detector walls by particles traveling above the Cherenkov threshold in the water.

A. Super-Kamiokande data
Pulses on PMTs exceeding a charge threshold corresponding to roughly 0.1 photo-electrons are registered as hits, all of which are processed by a software trigger system [85]. For T2K analyses, all hits occurring in the 1 ms windows centered on each beam spill arrival are written to disk. Beam spills are excluded from the analysis if they coincide with problems in the data acquisition system or the GPS system used to synchronize SK with the accelerator at J-PARC. Additionally, spills that occur within 100 µs of a beam-unrelated event are rejected to reduce the contamination of T2K data with cosmic ray muon decay electrons. The beam spill selection introduces an inefficiency of 1%, with roughly half of this being due to the preceding detector activity criterion.
For the analysis presented here, events associated with accepted spills are further required to have a reconstructed energy corresponding to an electron of at least 30 MeV and no more than 15 hits in the largest OD hit cluster. Additional criteria are used to reject spurious events that originate from spontaneous corona discharges in PMTs. Only events reconstructed in the [−2, 10] µs window around the leading edge of the beam spill are used in the analysis.
Distributions of the reconstructed times for events in both 1 ms and [−1.2, 5.6] µs windows around the beam spill arrival are shown in Fig. 15. In the 1 ms window, a peak of events coincident with the beam arrival is clearly seen; after applying the OD and minimum energy criteria very few events remain outside this peak. The eightbunch structure of the J-PARC beam is clearly seen in Events at SK are simulated using J-PARC (anti-) neutrino flux predictions and the neutrino interaction generator NEUT, which implements the neutrino interaction model described in detail in Section VI. Particles resulting from the neutrino interactions are propagated through an SK detector model using the same Geant3based [34] SKDETSIM 13.90 package as in [26]. The detector model, including the optical properties of the ultra-pure water and detector materials, is tuned to calibration data [86].

IX. SUPER-KAMIOKANDE EVENT RECONSTRUCTION AND SELECTION
Events at SK are reconstructed with the FiTQun maximum likelihood estimation algorithm [27]. While this algorithm was initially used exclusively for NCπ 0 background suppression in the ν e appearance channel [26,28], in recent T2K publications [16,25] FiTQun was used for all aspects of event reconstruction. Updating the reconstruction tools prompted a re-optimization of the event selection criteria, including an expansion of the fiducial volume (FV).
In this section, the reconstruction algorithm is briefly described, as well as the updated event selection criteria and the procedure for their optimization. A discussion of the systematic uncertainties related to the SK detector concludes the section.

A. Event reconstruction algorithm
The FiTQun likelihood function consists of the probability of each PMT registering a hit in a given event, and for hit PMTs, the probability density functions for the charge and time of the hit. Particles in an event are described by tracks (or track segments) parameterized by particle type, momentum, direction and initial position. The FiTQun likelihood is a function of these track parameters and multiple tracks can be combined to form complex event hypotheses.
In an initial pre-fitting stage, the approximate location of the neutrino interaction is found with a simplified likelihood using only the time of the PMT hits. A   residual time is calculated for each PMT hit by subtracting the Cherenkov photon time-of-flight, calculated using the straight-line distance from the vertex position to the PMT, from the hit time. Hits are associated to one or more clusters in residual time, with the initial cluster containing hits due to particles produced in the neutrino interaction and subsequent clusters containing hits due to products of weakly decaying prompt particles. Each hit cluster is then reconstructed separately by maximizing the likelihood function for the e, µ, π + and p singleparticle hypotheses.
For the earliest hit cluster only, multiple-track event hypotheses are also reconstructed using the results of the single-particle fits as the starting point. A multi-particle search algorithm is used to determine the number of particles observed in the event. This algorithm proceeds by iteratively adding a new electron-like or π + -like track to the event until the best-fit likelihood after adding the new track fails to improve beyond a set threshold. In the analysis described here, additional event hypotheses targeting neutral current backgrounds are used: a π 0 hypothesis consisting of two electron-like tracks consistent with a π 0 → γγ decay, and a π + hypothesis with two track segments compatible with a π + undergoing a hard scatter.

B. Event selection
Events are selected into samples using cuts on best-fit likelihood ratios between signal-like and background-like hypotheses: Λ α β def = log Lα L β , where α and β are competing hypotheses. The cut points are typically parameterized as a function of reconstructed kinematics, such as the best-fit electron momentum or the reconstructed invariant mass obtained from the π 0 hypothesis best-fit kinematics.
Five signal-enriched SK samples are used in the analysis. Samples of events containing a single reconstructed µ-like ring (1R µ ) and a single reconstructed e-like ring (1R e ) target ν µ and ν e CCQE interactions in both FHC and RHC beam modes. An additional sample, used in FHC data only, targets CC 1π + interactions where the π + is below Cherenkov threshold. The π + is identified by the detection of a delayed µ-decay electron following the single prompt electron which results from the CC interaction (1R e + 1 d.e). The CCQE-like selection criteria are the same for FHC and RHC samples.
Events in all samples are required to be fully contained (FC) in the ID using the cut on OD activity described in Sec. VIII above and to have only one prompt reconstructed particle identified by the multi-particle iterative search algorithm. Events are separated into e-like and µ-like with a criterion based on the likelihood ratio of the best-fit e-like to µ-like hypothesis (Λ e µ ) and the reconstructed momentum for the e hypothesis (p e ).
The FV criteria are defined in terms of the distance from the event vertex to its closest point on the detector walls (wall) and the distance from the event vertex to the detector wall along the track direction (towall). This parameterization of the FV allows for a larger volume of the detector to be used by reducing the wall threshold compared to previous T2K neutrino oscillation analyses, while ensuring that Cherenkov rings projected on the detector walls illuminate a large number of PMTs with the towall criterion, introduced for the first time in the analysis described here. The wall and towall criteria are chosen separately for each sample to maximize the sensitivity to θ 23 and δ CP , as described in Sec. IX D. For the µ-like samples, a minimum wall of 50 cm is required, with a minimum towall of 250 cm. The requirements for the e-like samples with no decay-e are wall > 80 cm and towall > 170 cm, while for the sample with one decay-e wall > 50 cm and towall > 270 cm are required.
For both FHC and RHC FC events, the distributions of the number of reconstructed particle tracks are shown in Fig. 16. For events with a single reconstructed track, the distributions of the e/µ discriminator and number of identified µ-decay electrons are shown in Figs. 17 and 18 respectively. In these figures, and throughout this section, the MC predictions are produced with the neutrino mixing parameters given in Tab. III, and the flux and cross-section parameters set to the best-fit value resulting from the near detector analysis described in Sec. XI.
The reconstructed momentum is required to be larger than 100 MeV/c for the e-like samples to reduce contamination from below-threshold-µ decays, and larger than 200 MeV/c for the µ-like samples.
Events in the µ-like samples can have up to one reconstructed decay-e, while the e-like samples are required to have zero and one decay-e for the samples targeting CCQE and CC1π + interactions, respectively.
Neutral current π production events are a background in both e-like and µ-like samples. In the former, electromagnetic showers resulting from π 0 → γγ decays can mimic an electron-like event; while in the latter, charged pion detector signatures are only significantly different from those of muons through their hadronic interactions, which are not always present. Additional criteria are used to remove these backgrounds in all analysis samples.  The values of sin 2 θ 12 and ∆m 2 21 are taken from [42], the value of sin 2 θ 13 from [43], and all the other values are set to the most probable value found in a previous T2K measurements [29].

Mass Ordering Normal
For the µ-like samples a cut is applied on the likelihood ratio of the best-fit single-π + to the single-µ hypotheses (Λ π + µ ) as a function of reconstructed µ momentum (p µ ). This selection criterion is only available in the FiTQun reconstruction algorithm and is deployed for the first time in the current analysis.
The NCπ 0 rejection criterion for the e-like samples, based on the likelihood ratio of the best-fit π 0 hypothesis to the e-like hypothesis (Λ π 0 e ) and the reconstructed π 0 mass (m γγ ), is unchanged from previous analyses, where it was the only use of FiTQun reconstruction. Distributions of the neutral current π 0 rejection discriminator for the 1R e and 1R e + 1 d.e. samples are shown in Figs. 19 and 20 respectively. For 1R µ events, the distribution of the neutral current π + discriminator is shown in Fig. 21.
Finally, e-like samples are required to have a reconstructed neutrino energy (E rec ) lower than 1250 MeV, as events beyond this energy are insensitive to neutrino oscillations and are more susceptible to systematic uncertainties associated with the neutral current rejection cut.
The selection criteria described above are summarized in Tab. IV and the breakdown of data and MC events passing each cut is given in Tabs. V, VI and VII, for the samples targeting ν e CCQE, ν e CC 1π + and ν µ CCQE interactions, respectively. respectively, and in Fig. 27 for 1R µ events.

C. Optimization of selection criteria
The likelihood ratio of best-fit e and µ hypotheses gives very good separation between these classes of events, with the separation improving at higher momentum. The cut line chosen to select e-like and µ-like events accounts for the momentum dependence of the likelihood ratio and achieves mis-identification rates smaller than 1% for true CCQE events across the T2K energy range.
Pion production in neutral current events forms one of the main backgrounds to both µ-like and e-like selections. Furthermore, since the cross sections for these processes are not known precisely, these contributions carry significant systematic uncertainties into the signal samples. A simplified neutrino oscillation analysis framework is used to optimize the neutral current rejection criteria taking into account systematic uncertainties and with statistics corresponding to an exposure of 7.8 × 10 21 POT, evenly split between neutrino and antineutrino modes. In this simplified analysis framework, the systematic uncertainties (taking into account the near detector constraints) are propagated to the SK prediction as a covariance ma-  trix in reconstructed neutrino energy. As a result, the SK samples do not constrain the nuisance parameters, and no correlations between these and the neutrino mixing parameters are taken into account. The four samples targeting e-like and µ-like CCQE events in both FHC and RHC neutrino beam mode are fit simultaneously to an Asimov data set [87] to determine the experiment's sensitivity under different neutral current rejection cut points.
The criterion to reject NCπ + events in the µ-like samples, a line cut on the Λ π + µ vs p µ plane, is chosen to minimize the width of the sin 2 θ 23 1σ confidence interval. This cut, which was not available with the reconstruction techniques used in previous T2K neutrino oscillation analyses, reduces the NC contribution to the µ-like samples by a factor of two, while selecting CCQE events with 99% efficiency.
The NCπ 0 rejection line cut in the Λ π 0 e vs m γγ plane, applied to the e-like samples, is optimized based on the significance to exclude δ CP = 0. As the optimal cut line is very close to the one used in previous T2K neutrino oscillation analyses, this criterion is not updated for the analysis described here. It should be noted that the relative impact of this cut on the selected sample is significantly smaller in the analysis described here, where it reduces the NC contribution in the e-like samples by a factor of three, compared to previous analyses, where the reduction is of a factor of 6. This is due to the excellent performance of the multi-particle search algorithm, which reduces the NC background in e-like samples by a factor of five by correctly identifying the two γs from π 0 decays with higher efficiency than the previously used algorithms.
Optimization metrics for both neutral current rejection criteria are shown as a function of the cut parameters in Figs. 28 and 29, along with the chosen cut points and distributions of the signal and background events in the cut variable planes. While this optimization is performed assuming the neutrino mixing parameters preferred by previous T2K results as given in Tab. III, it should be noted that in both cases the optimization metrics show large regions around the optimal points where they are consistently good. Therefore, the sensitivity of this analysis does not depend strongly on the exact value of the cut points and the experimental sensitivity is not expected  to depend strongly on the choice of neutrino mixing parameters used in the optimization procedure.

D. Fiducial volume expansion
In previous T2K neutrino oscillation analyses, events at SK were required to have wall ≥ 200 cm in order to remove entering backgrounds and ensure the quality of reconstructed quantities. The new event selection presented here, benefiting from improvements in reconstruction, provides an opportunity to re-optimize the FV criterion for the T2K analysis samples, with the objective of increasing the event yield while maintaining a high purity of signal events in the selected samples and a low impact of detector systematic uncertainty.
A two-dimensional parameterization of the FV criteria is chosen to allow for balancing two classes of effects. On one hand, the reduction of entering backgrounds and mitigation of the impact of known shortcomings of the simulated detector geometry are achieved with a wall threshold, as in previous analyses of T2K data. On the other hand, important aspects of event reconstruction, such as particle identification, improve with the number of PMTs illuminated by the Cherenkov ring patterns. As this number grows with the distance to the detector walls along the particle direction of travel, towall, a threshold on this distance is used to select events with a high reconstruction performance.
The FV criteria are optimized in a fit to SK atmospheric neutrino data, from which the systematic uncertainty associated to the particle counting and identification is also extracted.
The SK atmospheric neutrino FC data is divided into 18 samples consisting of combinations of six detector regions, defined with wall and towall as shown in Fig. 30, and three classes of events discriminated by the number (0, 1 or 2+) of detected Michel electrons. The six detector regions were chosen to isolate areas where the systematic uncertainty associated to the detector model is expected to differ, while maintaining an adequate level of statistics in the SK atmospheric neutrino data sample.
These samples are projected into three particle identification variables (Λ e µ , Λ π 0 e and Λ π + µ ) and a continuous variable that discriminates single-particle from multi-particle events (Λ 2−particles 1−particle ). In each of the samples the MC is split into six true event topologies consisting of: a single visible e, a single visible µ, a visible e with other visible particles, a visible µ with other visible particles, a single π 0 , and finally events with a single visible p or π + . For each topology, the particle identification and counting variables are linearly transformed with two nuisance parameters each: where Λ α β is the transformed variable and a and b are "scale" and "shift" nuisance parameters, respectively. These "scale" and "shift" parameters are estimated with a Markov Chain Monte Carlo (MCMC) method [88] that samples the Poisson likelihood for the observed data given the model that includes, in addition, parameters to capture the uncertainty on the atmospheric neutrino flux and cross sections. The SK atmospheric neutrino To ensure the validity of the atmospheric neutrino fit for the T2K oscillation analysis, high-statistics control sample data collected over the entire beam data period are used to monitor the detector stability. The flux and cross-section parameterizations used in this procedure are simpler than those used in oscillation analyses and are based on Refs. [89] and [90]. The atmospheric neutrino flux is scaled by two parameters, one affecting events with neutrino energy lower than 1 GeV, with a prior uncertainty of 25% and the other for events with energies higher than 1 GeV, with a prior uncertainty of 15%. The ratio of ν e + ν e to ν µ + ν µ events is controlled by a parameter with 5% prior uncertainty. A prior uncertainty of 20% is assigned to both CC non-QE and NC cross sections. The CCQE cross section has a more detailed parameterization, with events below 190 MeV having a prior uncertainty of 100% and events in the 190 MeV to 1 GeV range being characterized by 11 parameters which scale the cross section in unevenly spaced energy bins with gradually decreasing prior uncertainties from 41% to 2.2%, with 0.6 GeV events being assigned 5.4% prior uncertainty. CCQE events with energy in the 1 to 2 GeV and 2 to 3 GeV ranges are assigned 1.7% and 0.9% prior uncertainty, respectively. Finally, each sample Examples of the posterior predictive distributions resulting from the MCMC sampling are shown in Fig. 31.
To quantify the impact of the disagreement between data and MC on the T2K samples, the "shift" and "scale" parameters are sampled from the MCMC posterior and applied to the T2K beam MC. The T2K selection criteria other than FV are applied, and a fractional uncertainty is calculated for each analysis sample in each of the detector regions, with the MC separated in five true categories: CCQE, CC non-QE, CC events with mis-identified lepton flavor, neutral current, and entering backgrounds. In this procedure, the atmospheric neutrino flux and crosssection parameters used in the MCMC fit are marginalized over. The resulting fractional uncertainty on the expected number of events is taken to be the systematic uncertainty associated to the SK detector model.
With the detector systematic uncertainty estimated for each detector region, the optimal FV criteria are found by maximizing the following figure of merit, which quantifies the sensitivity of the sample with respect to changes in θ 23 and δ CP for µ-like and e-like samples, respectively: whereN is the expected number of events in a given sample, θ is the parameter of interest (θ 23 for the µlike samples and δ CP for the e-like samples) and σ syst is  rithm, optimized neutral current rejection cuts and expanded FV volume has a significant impact across all analysis samples. The signal acceptance in the 1R e sample increases by 20%, with the same purity as in previous analyses, while the signal events in the 1R e + 1 d.e. sample increase by 30%, with a reduction in the mis-identified muon background of 70%. The 1R µ samples have an increase in signal efficiency of 15% and a reduction in backgrounds of 40%.

E. Systematic uncertainty
The systematic uncertainty associated with the SK event selection is propagated to the oscillation analysis fitting frameworks as a covariance matrix in bins of either reconstructed neutrino energy or reconstructed lepton momentum, and broken down in true event topologies. Systematic uncertainties on the particle count and identification are extracted from the atmospheric neutrino data with the MCMC method described above. The uncertainties on decay-e tagging and misidentification of muons as electrons are extracted from differences between the data and the MC in a control sample consisting of cosmic ray muons that stop within the ID. The cosmic ray muon events are weighted in momentum and towall to match the expected distribution of beam-induced muons. The uncertainty on the decay-e identification efficiency is 1% and the uncertainty on the rate of spurious decay-e tags is 0.2%. The relative uncertainty on the mis-identification of muons as electrons is 30%, though the contamination of ν µ CC events is smaller than 1% in the ν e samples without decay-electrons and around 2% in the 1R e + 1 d.e. sample. Uncertainties introduced by the FV criteria are also estimated with MC to data comparisons in the cosmic ray muon sample, with both vertex and direction uncertainties taken into account. The reconstructed vertices in the cosmic ray muon sample cluster at the top or side walls of the detector, allowing for shifts in the MC relative to the data to be identified. The uncertainty on the direction is estimated by comparing the reconstructed muon direction to the equivalent quantity estimated using the muon and subsequent decay-e vertices. The uncertainties are 2.5 cm for the vertex position and 0.24 degrees for the direction, corresponding to a 0.3% to 0.4% systematic un- The uncertainty on the π 0 rejection efficiency in 1R e samples is estimated using hybrid π 0 sample constructed by superimposing an e-like event from the atmospheric neutrino or decay-e from cosmic ray muon data with a simulated γ with kinematics taken from NCπ 0 events in the MC. The procedure is performed using both the MC and the real data as the source of the event and the difference in π 0 rejection efficiency between the data-MC and MC-MC samples is taken as the systematic uncertainty, binned in reconstructed lepton momentum and angle with respect to the beam. The overall uncertainty on the π 0 rejection efficiency is 26%.
A summary of the uncertainties associated with the SK detector model is given in Tab. VIII.
To propagate the systematic uncertainty on the SK event selection to the neutrino oscillation analysis frameworks, a covariance matrix is computed using the beam MC and the uncertainties quoted above. The MC is weighted with the flux and cross-section parameters at their central value from the fit to the near detector data and neutrino oscillation weights using the parameters in Tab. III are applied. Variations of the MC are then produced with random throws of the systematic effects described above and projected into bins of true event topology and reconstructed neutrino energy or lepton momentum to produce the covariance matrix. The diagonal elements of the covariance matrix in reconstructed neutrino energy are shown in Fig. 33. Hybrid π 0 e/µ, π 0 /e, π + /µ, single/multi particle identification -Atmospheric neutrino

X. NEAR/FAR EXTRAPOLATION FIT
Parameterized flux and cross-section models are used to calculate the predicted event rates at ND280 and SK. These models are fit to the high statistics, unoscillated near detector data to constrain the parameter uncertainties and tune their central values. An additional uncertainty is included in the flux covariance to account for the fact that the near detector fit results for Runs 2-6 are extrapolated to the far detector, which uses a larger data set from Runs 2-9. As in Ref. [26], additional uncertainties which affect ν µ cross sections. The ND280 likelihood and fitting methods are unchanged since the analysis described in Ref. [26]. The 14 event samples are binned in p µ and cosθ µ , giving 1624 bins in total. The full likelihood includes a contribution from the binned χ 2 data-model comparison, and a prior penalty contribution for each parameter.
There are two near detector fitting frameworks used in this analysis. One fitter uses MINUIT to find the parameters which maximize the likelihood, while the other uses Markov Chain Monte Carlo (MCMC) methods to sample the parameter space. Both frameworks treat the systematics identically, and apply parameter variations on an event-by-event basis in the fit. The resulting parameter values from the MINUIT-based fit are used by two of the oscillation analyses (detailed in Sec. XI), with a covariance matrix describing their uncertainties. The MCMC analysis performs joint near and far detector data fits, but can also run near detector only fits for cross group validation.
The only change to the fitting frameworks since the last analysis was the treatment of the Fermi surface momentum systematics near their physical boundary. Previously, the covariance could not be calculated when the parameter was at its limit, causing the fit to not converge. For this analysis, the penalty contribution to the likelihood was 'mirrored' around the physical boundaries for the p F parameters. This meant the parameters were allowed to pass beyond their physical boundaries. The mirroring was performed by setting the likelihood for values beyond the boundary to the value of the likelihood the same distance from the boundary on the other side. For example, for a physical boundary at +1.0, the like-lihood at +1.2 would be the same as the likelihood at +0.8. This allowed the uncertainty to be calculated at the limit, and the fit to converge. There is a significant reduction in the postfit uncertainty for the majority of parameters. Those that are not constrained by the near detector fit are uncertainties that only apply to interactions with low statistics in the near detector. In the last analysis, Ref. [26], the neutrino flux increased for all samples and species, but this effect is no longer present. The difference between the nominal simulation and data is now being absorbed by the movement of other parameters, in particular the BeRPA model.
The goodness-of-fit for the near detector analysis was estimated by calculating the p-value in the MINUITbased framework. Toy data sets were produced by throwing all systematics according to their prior covariance, and applying them to the nominal simulation prediction. The likelihood for each toy data set is shown in Fig. 36, along with the likelihood from the data fit. The overall p-value for the fit is 47.3%. Near detector only fits with the MCMC analysis framework were used to cross-check the two analyses. The postfit cross-section parameters for the two fits are compared in Fig. 43, showing good agreement.

XI. OSCILLATION ANALYSIS FITTERS
To produce constraints on the three-flavor PMNS oscillation parameters, the rate and kinematic distributions of all five SK event samples are analyzed simultaneously. Systematic uncertainties in the flux, interaction and detector models are accounted for using systematic parameters applied as weights to the nominal prediction as described in [26]. Confidence regions and intervals are produced from marginal likelihood distributions as a function of parameters of interest.
The predicted kinematic distributions of each SK sample are generated from the nominal SK simulation to which weights are applied for each set of oscillation and   systematic parameter values. The oscillation weights correspond to the three-flavor oscillation probability, calculated with matter effects and dependent on the true neutrino energy and flavor [22], while the systematic weights are multiplicative factors.
A likelihood, L, is calculated according to Eq. 15 as the product of the Poisson likelihood ratios for the number of events in each bin of the kinematic variables considered for each SK sample: In Eq. 15, o is a vector of the parameters of interest, f is a vector of nuisance parameters, n obs s,i is the observed number of events in kinematic bin i of SK sample s which has a total of N s bins, n exp s,i = n exp s,i (o, f ) is the corresponding expected number of events. The parameter(s) of interest correspond to one or more oscillation parameters among sin 2 θ 13 , ∆m 2 32/31 , sin 2 θ 23 , and δ CP . The nuisance parameters correspond to the systematic parameters, and the oscillation parameters not chosen as parameters of interest in a given fit. To obtain a likelihood which only depends on the parameter(s) of interest o and a data set x (the set of n obs s,i in Eq. 15), a marginal likelihood L marg is computed: the full likelihood, made of the product of the likelihood L defined in Eq. 15 with the prior constraint on some of the parameters π (o, f ), is numerically integrated over the nuisance parameters: . In (c) and (d), the particle-counting parameter is shown for events with 1 and 2 decay-electrons, respectively. The e/π 0 discriminator distribution is shown in (e) for e-like events and and the µ/π + distribution is shown in (f) for µ-like events.
(7.53 ± 0.18) × 10 −5 eV 2 /c 4 [42]. The three parameters sin 2 2θ 23 , ∆m 2 23 and δ CP are unconstrained. T2K is sensitive to sin 2 θ 13 , but to date, the world's most accurate measurements of this parameter come from reactor neutrino experiments [10][11][12]. To obtain increased sensitivity to the other oscillation parameters, the reactor average of sin 2 2θ 13 = 0.0830 ± 0.0031 [43] may be used as a Gaussian constraint, which will subsequently be referred to as the "reactor constraint". It is also useful to evaluate the T2K experiment's constraint of the oscillation parameters without the contribution of the reactor experiments, and fits without this constraint on sin 2 2θ 13 were also performed.
Three different analysis frameworks are used to perform the fit of the far detector data. They will be labeled as A, B and C, and follow the general procedure described above, but differ on a number of points summarized in Tab. X: a. Kinematic information used to fit the data from the electron-like samples Two-dimensional distributions are used for those samples, either the combination of the momentum and angle with respect to the beam direction of the particle reconstructed as the lepton (p lep , θ lep ), or the reconstructed energy assuming CCQE kinematics (E rec ) combined to this angle θ lep .
b. Oscillation probability calculation The events can either be binned in true neutrino energy with an oscillation probability corresponding to the mean true neutrino energy of the bin, or have individual oscillation probabilities computed for each event's true neutrino energy.
c. Use of the near detector data The near detector data are either used in a simultaneous fit with the far detector data, or they can be fit separately to constrain the neutrino flux and interaction systematic uncertainties. In this second case, the constraint is propagated to the far detector analysis through a covariance matrix. The two methods are expected to lead to different results for the far detector fit if the constraint on the systematic parameters obtained with the near detector data cannot be properly described by a multi-variate normal distribution.
d. Fitting method Two of the analyses use a grid search method, where the likelihood in terms of the parameters of interest is computed for different fixed values of those parameters while marginalizing over the other parameters. This marginalization over the nuisance parameters is done through numerical integration. The last analysis uses an MCMC method to sample the parameter space, where the density of the obtained samples follows the joint posterior probability density of the parameters. Those samples can then be binned to produce 1D or 2D posterior distributions for the parameter(s) of interest, effectively marginalizing over the nuisance parameters. A full description of the MCMC analysis can be found in [91].
The T2K experiment's expected median constraints on the oscillation parameters can be evaluated by fitting to an Asimov data set, generated for the true values of the oscillation parameters given by Tab. III, and nominal values of the systematic parameters. Using Analysis A, the   Fig. 44, with all oscillation parameters fixed to the values in Tab. III. Using the differences between the three analyses, it can be checked that the results are not too sensitive to the details of how the analysis was performed. The expected sensitivities obtained by each analysis for δ CP and ∆m 2 vs. sin 2 θ 23 are shown in Fig. 47. They show good agreement between each analysis, thus the differences in the analysis methods do not significantly impact the results. Additional comparisons between the three analyses were performed for each combination of parameter(s) of interest, mass ordering, use of reactor constraint. In each of these, the results produced by the three analyses were found to be consistent.

Equation 2
shows that the neutrino oscillation probability depends upon the energy of the neutrino and its path length from creation to interaction. In longbaseline accelerator-based neutrino experiments the neutrino path length is fixed. Accurately measuring the neu-   set of models that describe the world's neutrino data but there are a number of models that are in comparable agreement to the world's data, as described in Ref. [54]. However, these models map true neutrino energy to re-  constructed neutrino energy in different ways. This is shown in Fig. 4, where the bottom plot compares the reconstructed energy bias between quasi-elastic-like and ∆-like 2p2h interactions. The choice of interaction model therefore affects the neutrino energy distribution that experiments infer from their observed neutrino events. This in turn can affect their measurement of the neutrino oscillation parameters, as has been shown in Ref. [92]. In this analysis a comprehensive set of neutrino interaction models have been tested using simulated data studies to quantify their effect on the T2K oscillation result. The simulated data procedure is described in Ref. [26], while the model changes that are tested here are described in Section VI. Simulated data is created for both the near and far detectors, where a fit is performed in the same way as for the real data. The resultant oscillation parameter contours are then compared to those extracted from a fit to the Asimov data set. If the T2K oscillation analysis is insensitive to the model change, or has the freedom to account for it correctly, then the simulated data contours and the Asimov contours should be very similar. The oscillation parameter values used for this study are shown in Tab. III. Other parameter sets with δ CP = 0 and sin 2 θ 23 = 0.45 were also studied, but showed no significant difference to the results presented here.
The likelihood distributions for each oscillation param-eter are created for both the simulated data fit and the Asimov fit. Any change in the center of the 2σ confidence interval for each parameter is taken as a bias due to the change in the interaction model introduced to the simulated data. This is compared to the uncertainty on the parameter coming from the systematic uncertainties included in the analysis. If a simulated data bias is greater than 50% of the systematic uncertainty on a parameter then an additional uncertainty is added to the analysis to account for this.
A. Simulated data study of the nucleon removal energy As described in Section VI the T2K neutrino event generator, NEUT 5.3.3, implements a Relativistic Fermi Gas (RFG) nuclear model. To remove a nucleon from the nucleus requires energy to overcome the nuclear potential. The nucleon removal energy (NRE), can be measured by electron scattering experiments, but is not known perfectly. Even the definition of this quantity is not simple [79]. In addition, the RFG is a very simple model of the nuclear structure. More advanced models, such as spectral function models, provide a much more detailed description of the nucleon energies within the nucleus,   each of which has its own NRE. The T2K oscillation analysis does not have a parameter to account for the NRE uncertainty directly in the fit. Therefore the effect of the NRE uncertainty on the oscillation parameter measurements is evaluated using a simulated data study. Neutrino events were generated with the RFG nucleon removal energy increased by 18 MeV from the nominal value (25 MeV for interactions on Carbon and 27 MeV for interactions on Oxygen). Increasing the NRE results in a decrease in the energy available to the lepton produced in the neutrino interaction. As a consequence, the events simulated with a larger NRE produced leptons with lower momenta than the nominal MC. This momentum shift was calculated as a function of neutrino energy and lepton angle, shown in Fig. 48. The momentum shift in Fig. 48 was applied to the reconstructed momenta of the charged lepton for the full near and far detector MC. The MC was then scaled to match the POT exposure of the data, creating a simulated data set. The ratio of the simulated data to the nominal MC is shown in Fig. 49 for the ND280 CC 0π sample, highlighting the shift in events from high momentum bins to lower momentum bins. A fit is then performed on this simulated data using the analysis framework described earlier in this paper. There are no parameters in the T2K cross-section model that change the lepton momen-tum directly, but combinations of parameters are able to mimic this effect. The result of fitting the simulated near detector data is presented in Fig. 50, which shows the best fit values for the flux and cross-section model parameters. Figure 50 shows that the low energy flux is increased, to increase the rate of low momentum events. The 2p2h shape parameters control whether the 2p2h events in the MC are produced more by ∆(1232) resonances (values above 1.0) or other modes (values below 1.0). The ∆(1232) resonance produces leptons with a lower momentum than the other modes. This means that increasing the shape parameter increases the rate of 2p2h events in the low momentum region.
The near detector fit result is used to predict the oscillated event distribution at the far detector. This is shown in Fig. 51, which compares the nominal MC to the simulated data and the near detector prediction. The near detector prediction is closer to the simulated SK data than the nominal MC simulation, but does not match well below the oscillation dip. The nominal prediction is fit to the simulated SK data to extract the best fit oscillation parameter values and their 2σ confidence intervals. These are compared to oscillation parameter values and confidence intervals produced by fitting to the Asimov data set, as described in Sec. XI. This comparison is shown in Fig. 52.
Changing the NRE changes the shape of the simulated data, not the normalization. Since the constraint on sin 2 θ 13 and δ CP is largely driven by the normalization of the electron-like sample, it is unsurprising that these parameters are relatively unaffected. In addition sin 2 θ 13 is strongly constrained by the reactor experiment measurements. However, both sin 2 θ 23 and ∆m 2 32 show a significant difference between the nominal and the simulated data results, with sin 2 θ 23 shifting towards maximal disappearance and ∆m 2 32 decreasing. The changes in the oscillation parameter contours indicate that the T2K cross-section model parameterization cannot account for changes to the NRE. The near detector fit mis-attributes the NRE change to the flux and 2p2h model parameters. The near detector post-fit prediction has a different neutrino energy distribution to the nominal MC, and so produces different far detector event distributions when the neutrino oscillation probability is applied. As a result the oscillation parameters extracted from the simulated data fit no longer match those from the nominal MC analysis.
To account for this in the T2K analysis an additional uncertainty is introduced. A spline is created for each bin of the five far detector sample histograms. The value of the spline is the ratio between the simulated far detector data and the far detector prediction calculated using the near detector simulated data fit result. Spline knots are created using NREs of 18 MeV, 27 MeV (nominal) and 45 MeV at both near and far detector. The splines are 100% correlated across all sample bins, and produce a multiplicative weight to scale the far detector prediction in each bin. The spline takes into account the change in the far detector prediction due to both the changing NRE and the mis-fitting at the near detector. This means that it does not provide a measurement of the true NRE, but an 'effective NRE' taking into account both of these effects. A prior constraint is placed on this effective NRE parameter, setting the central value to 27 MeV, with an uncertainty of ±18 MeV.
The result of including this parameter in the simulated data study is shown in Fig. 53. The new parameter increases the size of the oscillation contours whilst shifting the simulated data result to be in much better agreement with the expectation. There is some residual difference between the contours, particularly in the ∆m 2 32 best fit point. This difference is included as an additional uncertainty in the analysis by smearing the ∆m 2 32 likelihood surface. The far detector oscillation fit likelihood as a function of ∆m 2 32 has a Gaussian distribution. This distribution is convolved with a Gaussian of unit area, centered at 0, with a width given by the shift in the ∆m 2 best fit point between the simulated data fit and the expectation. This is equivalent to adding this shift as an uncertainty on ∆m 2 32 in quadrature with the existing uncertainties in the analysis. The result of this smearing is shown in Fig. 53. Table XII shows the final bias table for simulated data sets studied in the analysis, after the addition of the NRE uncertainty parameter. For the data-driven E ν − Q 2 category the largest effect from the three simulated data studies is shown. In all cases the observed bias on sin 2 θ 23 and δ CP was insignificant compared to existing systematic uncertainty on the parameter and so no additional uncertainty was introduced. Non-negligible bias was observed for ∆m 2 32 . The quadrature sum of the observed biases, 4.1 × 10 −5 eV 2 c −4 , was added as an additional uncertainty on ∆m 2 32 using the method described above. The effect of the systematic parameters on the predicted event rates on each SK event sample, including the additional NRE uncertainty, is shown in Tab. XIII. The effect of the prior uncertainties on the (typically marginalized) oscillation parameters sin 2 θ 12 , ∆m 2 21 and sin 2 θ 13 is also

XIII. OSCILLATION ANALYSIS RESULTS
Following the recommendations in [93], we produce results using different statistical approaches, both frequentist and Bayesian (with analysis of sensitivity to prior) and test the frequentist properties of our Bayesian methods when possible. Using the three far detector analyses described previously, point and interval estimations are made for the parameters sin 2 θ 23 , δ CP and ∆m 2 32 (normal ordering) or ∆m 2 13 (inverted ordering). Two types of intervals are produced: confidence intervals (with approximate coverage based on the constant ∆χ 2 method in most cases, and with exact coverage using the Feldman-Cousins unified approach for δ CP ) and credible intervals. The mass ordering was studied using mainly Bayesian hypothesis testing, with additional frequentist checks.
A. Measurements of the parameters of the 3 flavor oscillation model

∆χ 2 and frequentist results
Intervals based on the constant ∆χ 2 method are produced for the different parameters using analyses A and C (analysis B can produce similar intervals for comparison purpose, although its main results are the credible intervals described in XIII A 2). As their results are in good agreement, only the results obtained with analysis A are shown in this section, unless otherwise indicated. The best fit values and 1σ confidence intervals obtained for the different parameters in both mass ordering scenarios are summarized in Tab. XIV and in Tab. XV with and without using the results of reactor experiments to constrain sin 2 θ 13 , respectively. The global best fit was found to be for the NO, and the data show a preference for the upper octant. These preferences will be quantified in part XIII C. The ∆χ 2 = −2 ln (L/L max ) functions obtained for δ CP with and without using the results of reactor experiments to constrain sin 2 θ 13 are displayed in Fig. 54. The favored and disfavored values of δ CP are similar between the two cases, but the constraint on δ CP becomes stronger when the reactor experiments results are used. The obtained 90% confidence regions for (sin 2 θ 23 , δ CP ) are displayed in Fig. 55. The largest parts of the confidence regions are located in the upper octant, especially when the constraint from reactor experiments is used, but the results are still compatible with maximal mixing. For the atmospheric parameters, the obtained normal ordering 90% confidence region for (sin 2 θ 23 , ∆m 2 32 ) is shown in Fig. 56 together with the measurements from other neutrino oscillation experiments. Good agreement is seen between all of the experiments.  Section XI demonstrated that analyses A, B and C have similar sensitivities (Fig. 47), their data fit results are now compared for δ CP vs sin 2 θ 13 in Fig. 57 and ∆m 2 vs sin 2 θ 23 in Fig. 58. The largest differences are observed for analysis C, in particular for the atmospheric parame-  holds [94], but can result in poor coverage when this is not the case. In particular, it is not expected to give proper coverage for δ CP , due to the cyclic nature of the parameter and the presence of physical boundaries at ±π/2. In these cases intervals with exact coverage can be formed directly from the likelihood ratio, by computing the appropriate ∆χ 2 critical value for each value of the parameters considered, as proposed by Feldman and Cousins [95]. This method is very CPU-intensive, so it is only used for the one-dimensional δ CP interval.
For analyses A and C, critical values of ∆χ 2 (values of the parameter of interest for which the ∆χ 2 is lower than the critical value for a given confidence level are included in the confidence interval for this level) were calculated for multiple values of δ CP as follows. Pseudo-experiments were generated at 9 evenly-spaced grid points of δ CP in each mass ordering, and at two additional points near the intersection of the 3σ critical value and ∆χ 2 curves. The systematic and other oscillation parameters are randomly varied to generate the pseudo-experiments, with different procedures for the different parameters. The systematic parameters, sin 2 2θ 12 , ∆m 2 21 and sin 2 2θ 13 are drawn from the prior probabilities described in Sec. XI, with sin 2 2θ 13 constrained by reactor data. The two oscillation parameters which do not have a prior constraint in the analysis, sin 2 θ 23 and ∆m 2 32 , are drawn from the 2D likelihood resulting from the fit of an Asimov data set generated at the best fit point obtained in the T2K+reactor data fit. Each set of parameter values is used to generate predicted kinematic distributions for each sample, which are then sampled assuming a Poisson probability in each recon-structed variable(s) bin to obtain a pseudo-experiment. The pseudo-experiments are fit in the same manner as the real data, and the ∆χ 2 between the true and best fit δ CP and mass ordering is recorded. The N th percentile of this distribution then forms the N% critical value for this combination of δ CP and mass ordering.
The obtained critical values for δ CP are displayed in Fig. 59. For all confidence levels, significant deviations from the values expected for a parabolic log likelihood function (as assumed by the constant ∆χ 2 method) are observed, demonstrating the necessity of the method. The critical values obtained by Analysis A were compared to those of Analysis C and were generally found to be in good agreement, with minor differences mostly explained by the different kinematic variables used by the two analyses for appearance samples.
In order to better understand the structure of the critical values, the effects of the δ CP -mass ordering and cos(δ CP ) degeneracies, the physical boundaries around δ CP = ±π/2 and of the beam exposure were studied and found to have a significant effect, due to the following factors: 1. The amount by which the ν µ → ν e and ν µ → ν e oscillation probabilities can be changed by varying δ CP is limited. This creates an effect similar to physical boundaries at the values of δ CP corresponding to the maximum and minimum number of events in the appearance samples. The critical values in those regions are therefore lower than those expected under the assumption of a parabolic loglikelihood.
2. Mass ordering is a discrete parameter and works as an additional degree of freedom, raising critical values, though without sufficient freedom to make the critical values behave as if the problem had 2 degrees of freedom.
3. The effect of physical boundaries is more visible for the 3σ critical values, since the critical values at this confidence level are usually determined by the pseudo-experiments corresponding to more extreme statistical fluctuations.

4.
The critical values increase with statistics, and the values obtained have increased relative to their values in previous T2K analyses [25,26,29].
To understand this last point, critical values for the 1, 2, 3σ and 90% confidence levels were computed assuming different exposures, and were found to non-linearly increase with exposure in all cases. The leading cause was found to be the approximate degeneracy between δ CP and π − δ CP (T2K observables are mainly sensitive to sin δ CP , with cos δ CP having a much smaller effect), which acts as an additional pseudo-degree-of-freedom that is negligible at low exposures but becomes more important as more data is taken. The above physical boundary effects also contribute at all exposures; however, for true δ CP values  this level would be computationally intractable, so additional care is taken to ensure the validity of the 3σ confidence intervals. The confidence intervals are calculated for the different combinations of the nominal and ±1σ binomial error values of the critical ∆χ 2 , and the largest interval is kept. A second source of error comes from the interpolation of the critical ∆χ 2 between the points where it was computed. To minimize this effect, critical values have been calculated at two additional δ CP values, each close to the boundaries of the 3σ confidence interval. The resulting confidence intervals obtained for δ CP are listed in Tab. XVI and the obtained 3σ intervals are displayed in Fig. 60, superimposed on to the observed ∆χ 2 function.
In both mass orderings, the two CP conserving values of 0 and π are outside of the 2σ confidence intervals. Our data therefore exclude the conservation  of CP symmetry in neutrino oscillation at the 2σ level. For the inverted mass ordering, both CP-conserving values are outside of the 3σ confidence intervals. For the normal mass ordering, δ CP = 0 is just outside of the 3σ confidence interval, while δ CP = ±π is inside. The robustness of the exclusion of δ CP = 0 given the model uncertainties is studied in more details in Sec. XIII B 3, where it was found that the boundary of the 3σ interval is so close to δ CP = 0 that this point can move in and out of the 3σ interval due to changes in the model. On the contrary, the exclusion of the CPconserving values at the 2σ level was found to be robust. The histogram axes have been truncated for clarity.

Bayesian results
Bayesian results for the oscillation parameters were produced using Analysis B. The posterior probability of δ CP marginalized over both mass orderings obtained in a fit of T2K data with sin 2 θ 13 constrained by the reactor experiments is shown in Fig. 61 with two different prior probabilities for the parameter of interest. The CP conserving values of δ CP are found to be outside of the 2 σ credible interval in both cases, with the 1 σ range still covering the maximal CP violation value of δ CP = −π/2. Most of the results in this section use a prior probability uniform in δ CP , the alternative prior probability was tested to see the effect of the choice of prior. The alternative was chosen uniform in sin(δ CP ) as this is both the variable involving δ CP to which our observables are most sensitive to, and the one relevant for CP violation. The detailed comparison between the intervals obtained with the two prior probabilities can be found in Tab. XVII, where it can be seen that this choice affects the size of the intervals obtained, but does not change the main conclusions of the analysis. The proposal function for choosing the random steps used in Analysis B's MCMC can jump between mass orderings, with a 50% chance at each step to propose a point with the opposite sign of ∆m 2 32 . It therefore produces a single posterior probability for ∆m 2 32 , covering both positive and negative values. The posterior probability obtained in a fit of T2K data with constraint from the reactor experiments, is shown in Fig. 62. A clear preference towards the normal ordering, which contains 89% of the total posterior density, can be seen, with in particular the lack of an observed 1 σ credible region in the ∆m 2 32 < 0 part of the figure. Figure 64 shows the posterior density distributions and credible regions for combinations of all the oscillation parameters, as well as their individual posterior density and credible intervals. Correlations between the estimates of the different parameters can be seen, as well as the effect of marginalizing over one of them to obtain the onedimensional posterior distribution of another oscillation parameter. As the sin 2 θ 23 posterior distribution has a largely non-Gaussian shape, the marginal likelihood (for the other parameters) used in the analyses described in this paper is expected to have some differences with the profile likelihood commonly used in the neutrino community. Correlations are mostly seen between the atmospheric parameters, sin 2 θ 23 and ∆m 2 32 , and between ∆m 2 32 and δ CP . The correlation between the atmospheric parameters is clear from Eq. 10. For ∆m 2 32 and cos(δ CP ), both parameters shift the energy of the peak (dip) in the electron (muon) neutrino oscillation probability. Combined with the energy profile of the T2K beam this energy shift produces a change in event rate for the electron-like samples, and a smaller change in the muon-like samples. This change in both rate and shape introduces a correlation between the two parameters.

B. Additional checks on the validity of δCP results
The main result of the analysis described in this paper is the measurement of δ CP , with in particular the exclusion of CP conservation in neutrino oscillations at the 2σ level, and the fact that some of the possible values of δ CP are outside of the 3σ confidence intervals. The observed constraint on δ CP is considerably stronger than that expected from sensitivity studies, as can be seen by comparing Figs. 45 and 54. A number of additional studies were done to check the likelihood and robustness of this result.

Probability of the δCP results
The probability of getting a constraint this strong or stronger considering systematic and statistical uncertainties was evaluated assuming normal ordering and a true value of δ CP of −π/2. This case was chosen as it gave the best agreement between data and predictions in Tab. XI. 10 000 pseudo-experiments were generated following the same method as used in the determination of the Feldman-Cousins critical values. The pseudoexperiments were then fitted for δ CP , and the resulting ∆χ 2 distributions were compared to the ∆χ 2 functions obtained for the data. Figure 65 shows the observed ∆χ 2 = −2∆ ln L overlaid on the one-sided 1σ and 2σ bands. Those bands are built from the ensemble of ∆χ 2 values from the different pseudo-experiments at each value of δ CP , and cover the interval between zero and the 68.27% (1σ band) or 95.45% (2σ band) quantile. The observed ∆χ 2 function lies within the one-sided 2σ band for the fits done in the inverted mass ordering scenario, whilst falling just outside the 2σ band in the normal mass ordering scenario in the region around δ CP = 0. An alternative way to consider the results is to check what fraction of pseudo-experiments exclude one or both of the CP conserving values at a specified confidence level. As shown in Tab. XVIII, for the normal mass ordering, δ CP = 0 and δ CP = π are separately excluded at 90% confidence level in around 46-48% of pseudo-experiments and excluded at 2σ confidence level in around 32-34%. Both CP-conserving values are excluded at the 2σ confidence level in about 25% of the pseudo-experiments. Overall, a 2σ exclusion of CP conservation is not unlikely according to the model used in this analysis.

Contributions from individual samples
The discrepancies between predictions and observations seen in some of the samples used in the analysis were studied in more detail. In Tab. XI, the observed numbers of events in the far detector samples are generally in good agreement with the predicted number of events for δ CP = −π/2. Two samples show a difference with the nominal prediction: for the FHC 1R µ-like sample a small deficit of events can be seen in the data, corresponding to 1.33 times the statistical and systematic uncertainty on the predicted number of events. For the FHC ν e CC 1π +like sample, the discrepancy is larger with the observed number of events being twice the predicted number. To understand the impact on the δ CP result of the differences between predictions and observations from each sample, hybrid MC/data fits were performed: the data were replaced by the MC predictions for each sample in turn, as shown in Fig. 66.
It can be seen that in all cases including data, the constraint on δ CP is stronger than that obtained from the MC predictions alone. For the CP conserving value δ CP = −π, the most significant changes happen when data are replaced by MC predictions for the two samples which showed differences between observations and predictions. In particular, in the case of the FHC ν e CC 1π + -like sample, the ∆χ 2 at δ CP = −π becomes lower than the 2σ critical value obtained with the Feldman-Cousins method, meaning that the conservation of CP symmetry is no longer excluded with 2σ significance in this case.
Additional studies were therefore performed to estimate the likelihood of the observations for the FHC ν e CC 1π + -like sample. First, the probability to obtain 15 events or more in this sample when taking into ac-  after the addition of the 'effective NRE' parameter (middle) and after the additional smearing is applied to ∆m 2 32 (bottom). In all cases the expected result, as described in the text, is shown in black and the simulated data result is in red. The solid lines represent the 90% confidence region and the dashed lines represent the 1σ confidence region.
count statistical and systematic uncertainties was evaluated. This probability was found to be 2.49% for true values of the oscillation parameters corresponding to the T2K-only best fit, and 1.34% for the T2K with reactor constraint best fit point. As there are 5 samples in the analysis, a trial factor should be taken into account. The probability to have such an excess in at least one of the 5 samples (meaning an excess of events in the sample corresponding to a p-value smaller or equal to the p-value for the FHC ν e CC 1π + -like sample) was found to be 11.3% for the T2K only best fit, and 5.8% for the T2K+reactor best fit point.
p-values were also calculated when taking into account not only the number of events, but also the distribution of the kinematic variables of the observed events, and were found to be equal to the rate-only p-value. As can be seen in Fig. 67, the kinematic distributions of the data events are in good agreement with the prediction, and The observed T2K constant ∆χ 2 90% confidence regions (from analysis C) of sin 2 θ 23 and ∆m 2 32 with the reactor constraint and compared to the results from Super-K [4], NOνA [13] and IceCube [14]. 19  and 90% (solid) CL regions for ∆m 2 vs sin 2 θ 23 produced using analyses A, B and C, assuming normal mass ordering (top) or inverted mass ordering (bottom).
The reactor constraint is applied.
so taking into account the shape information does not increase the disagreement between data and prediction.
3. Additional interaction model checks Section XII described checks for possible biases coming from the choice of interaction model. Those checks were based on comparisons of MC sensitivities obtained with different interaction models. However, some model uncertainties affect primarily the FHC ν e CC 1π + sample, which brings only a small contribution to the δ CP sensitivity compared to the single ring e-like samples, and therefore do not have a significant effect on the simulated data studies. They can nevertheless have a sizable effect on the predictions for the FHC ν e CC 1π + sample, and given the large excess of data events in this sample, and its effect on the δ CP measurement, the possible impact of additional model uncertainties were considered. The additional studies were done in the context of the data fit and not the MC based sensitivity studies.
The first source of uncertainty studied is the data/MC discrepancy in the pion spectrum for the near detector   CC 1π sample. Even after tuning using the results of the near detector data fit, the predicted pion momentum spectrum did not reproduce the data for the near detector FHC CC 1π + sample: the data events had lower pion momentum than predicted by the model. This discrepancy could have an impact on the prediction for the far detector FHC CC 1π + sample: charged pions can appear at SK either as rings if they have high enough momentum (> 156 MeV/c), or as Michel electrons from the decay of the pion. If the pions produced in CC 1π + interactions have lower momentum than our model predicts, a larger fraction of the CC 1π + events at SK will enter the FHC CC 1π + sample than our MC predicts. Studies showed that a discrepancy of the size observed at the near detector could lead to a 10% increase in the number of events in the far detector CC 1π + sample. To evaluate the possible impact of this uncertainty on our δ CP results, the fit of the run 1-9 data was redone with two different modifications: • adding a 10% normalization uncertainty on the number of events in the far detector ν e CC 1π + sample.
• increasing the number of events in the SK ν e CC 1π + sample by 10% in the MC predictions used to fit the data.
The second model uncertainty concerns the number of hadrons produced in deep inelastic interactions in the low invariant mass region W < 2 GeV. Those interactions correspond to a specific interaction mode in the NEUT generator used to produce the MC, and have at least two pions produced at the interaction level. They can nevertheless enter the ν e CC 1π + sample if some of those pions re-interact in the nucleus (through the final  state interactions) or are not detected. The uncertainties on the number of hadrons produced in those interactions produce an uncertainty on the fraction that will enter the ν e CC 1π + sample. To assess the potential impact on the δ CP result, the fit of the run 1-9 data was redone using two alternative models for the hadron multiplicity model in the MC: the first (M1) is based on fits of data from deuterium bubble chamber experiments [96], while the second (M2) is based on the multiplicity part of the AGKY model [97]. Only the alternative model M1 was found to significantly change the expected number of events, with the biggest effect seen for the ν e CC 1π + sample (+13.5%). The predicted reconstructed energy spectrum was also affected for this sample, as the increase in the number of events was primarily at low reconstructed energy. Figure 68 shows how the ∆χ 2 obtained in the data fit for δ CP changed when an alternative model was used to fit the data. For each of the two model uncertainties, only the case which gave the largest effect is shown: 10% increase in the SK ν e CC 1π + sample MC predictions for the pion spectrum case, and M1 model for the hadron multiplicity model case. Both those changes predict an increase in the number of events in the ν e CC 1π + sample, reducing the discrepancy with the data, and therefore weakening the constraint on δ CP . The magnitude of the change in ∆χ 2 is small enough that it does not change the main conclusions obtained in the fit of the data for δ CP : the CP-conserving values are excluded at the 2σ level, and some values of δ CP are outside of the 3σ confidence level intervals. It is large enough however to change whether δ CP = 0 is excluded at the 3σ confidence level or not in the results of analyses A and C. FIG. 68: Change in the ∆χ 2 function for δ CP when fitting data using an alternative model for DIS hadron multiplicity (red) and pion momentum spectra (black). The ∆χ 2 was reduced by the value shown on the plot in each case. The difference is shown for the model change that gave the largest difference in each case.

C. Neutrino mass ordering
The question of the mass ordering was studied in a Bayesian framework, by computing posterior probabilities and Bayes factors for each ordering hypothesis. Results shown in this section use the constraint from reactor experiments on sin 2 2θ 13 unless otherwise stated.

Posterior probabilities and Bayes factor
All three analyses evaluated posterior probabilities to estimate the preference for the neutrino mass orderings. Analyses B and C additionally looked at the posterior probabilities for the octant of sin 2 θ 23 . The practical calculation of the posterior probabilities differs between the analyses, due to differences in the fitting techniques used. Analyses A and C first compute the marginal likelihood for each hypothesis, and compute the posterior probability for hypothesis H i from: Where the denominator sums over all the possible hypothesis combinations (either the two mass orderings, or the 4 combinations of mass ordering and octant), (N obs , x) is the observed measurement and P (H) is the prior probability, taken to be equal for all the hypotheses. The MCMC-based Analysis B calculates the hypotheses' posterior probabilities by counting the number of MCMC steps in the selected hypothesis against the total number of MCMC steps. That ratio gives a fully marginalized posterior probability for a given hypothesis. Table XIX shows the posterior probabilities for the mass orderings and the octants of sin 2 θ 23 obtained with Analysis B. Most of the posterior probability lies in the upper octant and normal mass ordering. The obtained values of the Bayes factors, corresponding to the ratio of the marginal likelihoods of the two hypothesis, are of 8.0 for the normal over the inverted mass orderings, and 3.9 for the upper over the lower octant. The commonly-used Jeffreys' scale [98] classifies both results as "substantial". The impact of the details of the analysis methods on the mass ordering results were checked by comparing the results of the different analyses. Analysis C prefers the normal mass ordering and the upper octant of sin 2 θ 23 with 91.1% and 80.4% posterior probability, respectively, whereas Analysis A finds a posterior probability of 87.7% for the normal mass ordering. As before, the largest difference is seen between Analysis C on one side and A and B on the other, with the main contribution being the choice of variables used for the kinematic information of the candidate events from the appearance samples.
The posterior probabilities above assumed equal prior probabilities for the different hypothesis. The effect of the choice of prior probabilities was checked by looking at how the posterior probabilities obtained in the data fit from Analysis C changed as a function of the prior probabilities assumed (Fig. 69). As expected when testing discrete hypotheses using Bayesian methods, the choice of the prior probability has a significant effect on the obtained posterior probabilities. However, the obtained curves are different from y = ±x, demonstrating that the data contain information about the mass ordering.

Frequentist properties of the Bayesian results for the mass ordering
As advocated in [99], the frequentist properties of the Bayesian mass ordering results were studied. More precisely, it was checked whether excluding a given ordering hypothesis (based on the other ordering having a posterior probability superior or equal to α%) was selecting the true ordering approximately α% of the time. For this purpose, 20,000 pseudo-experiments were generated, both for each mass ordering hypothesis and for different true values of δ CP . Then, the fraction of the pseudoexperiments which had a posterior probability higher than 95% for one of the two orderings were studied. It was found that this fraction depended strongly on the true value of δ CP assumed. Only for true values of δ CP close to −π/2 could one of the two orderings be excluded a non-negligible part of the time based on this criteria. Additionally, only the normal ordering could have a posterior probability above 95% in this case, regardless of which of the two hypotheses was assumed to be true when generating the pseudo-experiments. For true δ CP = −π/2, the true ordering was excluded 5.67% of the time the wrong one was. This shows that using the posterior probability to select the mass ordering, although a Bayesian method, nevertheless has reasonable frequentist properties in this case.
The pattern seen highlights the degeneracy between the measurement of δ CP and the determination of the mass ordering (also visible on Fig. 72 of the next section), which have similar effects on the predictions: they both change the two observables (the number of ν µ → ν e and ν µ → ν e events) in opposite directions. The sensitivity to the mass ordering as a function of δ CP (Fig. 70) shows that for δ CP < 0 (δ CP > 0), only for the true ordering being normal (inverted) does one get some expected values of the test statistics outside of the range predicted for the other ordering hypothesis. One can therefore only hope to exclude with non-negligible statistical significance the inverted (normal) ordering in this case.

Frequentist results for the mass ordering
For completeness, frequentist results for the mass ordering were derived by computing p-values for the Bayes factor. 100,000 pseudo-experiments were generated, randomizing over the nuisance parameters (including sin 2 θ 23 , δ CP and ∆m 2 32 , following a similar method as for the frequentist 1D δ CP results), and the value of the Bayes factor P (NO)/P (IO) obtained in the data fit was compared to the values obtained for those pseudoexperiments (Fig. 71) The fraction of the pseudo-experiments for which the Bayes factor was larger than what was observed in the data (corresponding to a result more NO-like) was found to be 4.87 × 10 −3 assuming true inverted ordering (inverted ordering p-value), and 6.5 × 10 −2 assuming true normal ordering (1 minus normal ordering p-value using standard definitions). Those two values are both low, and it would be misleading to claim exclusion of the inverted ordering based solely on the low p-value obtained for this hypothesis. An alternative tool sometimes used in the collider community in such cases is CL s [100], in which the p-value obtained for one hypothesis is penalized by one minus the p-value for the other hypothesis: CL s (IO) = p 0 (IO) 1 − p 0 (NO) (18) where p 0 (IO) and p 0 (NO) are the p-values for respectively the inverted and normal orderings. In this case, CL s (IO) = 0.075 is obtained. It should be noted that the more commonly used test statistic ∆χ 2 = χ 2 NO − χ 2 IO is equal to −2 ln(Bayes(NO/IO)). The p-values obtained with this ∆χ 2 test statistics are therefore the same as the ones presented here for the Bayes factor.

D. Summary of results
To summarize the oscillation parameter constraints, fits to δ CP , sin 2 θ 13 , sin 2 θ 23 and ∆m 2 32 have been produced using constant ∆χ 2 critical values, and their 1σ confidence intervals are listed in Tab. XIV and in Tab. XV with and without using the results of reactor experiments to constrain sin 2 θ 13 , respectively. Additionally, δ CP critical values have been calculated using the Feldman-Cousins method, and several confidence intervals are listed in Tab. XVI.
It is valuable to see how different values of δ CP , sin 2 θ 23 and the mass ordering affect the predicted event rates. Figure 72 shows the predicted ν e event rate vs ν e event rate for true values of oscillation parameters where δ CP is varied between CP conserving and maximally CP violating values, and sin 2 θ 23 is varied around maximal mixing, for both mass orderings. Predicted event rates for a given value of sin 2 θ 23 and mass ordering are linearly interpolated between those computed for 9 evenly-spaced values of δ CP from −π to +π to indicate the behavior produced by varying δ CP . The observed number of FHC and RHC 1-ring electron-like candidate events falls on the edge of the 1σ uncertainty region generated around true δ CP = −π/2, sin 2 θ 23 = 0.5, ∆m 2 32 = 2.45 × 10 −3 eV 2 /c 4 and normal mass ordering, and is therefore consistent with this point.

XIV. νe APPEARANCE ANALYSIS
This section, which differs from the main oscillation analysis reported above, evaluates the significance of the ν µ → ν e oscillation under the assumption of two different hypotheses, corresponding to no ν e appearance, and to ν e appearance consistent with our current knowledge of the PMNS mixing parameters. To date, the world's best measurement of ν e appearance has been made by the NOνA collaboration [13], which reports an excess of 4.4σ over the expected background. The T2K measurement of ν e appearance has been made using the same analysis framework described in Section XI and has been performed by analyses A and C.
The ν e appearance analysis is performed by multiplying the ν µ → ν e PMNS oscillation probability by a factor, β, i.e. P (ν µ → ν e ) = β × P PMNS (ν µ → ν e ). The parameter β is set to either 0 or 1 to select a null hypothesis for two independent tests: when β = 0, the null hypothesis under consideration is that there is no ν e appearance, while for β = 1 the null hypothesis is that ν e appearance occurs according to the current best knowledge of the PMNS parameters. For each hypothesis, p-values are produced from two analyses: a rate-only whose test statistic is the number of candidate events in the RHC one-ring e-like sample, and one rate+shape analysis whose test statistic is the difference in marginal negative log-likelihood values between the β = 0 and β = 1 cases, denoted ∆χ 2 , as in Eq. 19.
Here the use of χ 2 is taken to be synonymous with −2 ln L marg . Unlike in the main oscillation analysis, the likelihoods are not only marginalized over the flux, crosssection and detector parameters, but also over all oscillation parameters except β, including δ CP and the mass ordering, using 2×10 5 samples of the nuisance parameter space. The number of pseudo-experiments and the number of samples used during marginalization were selected to ensure the stability of the p-values.
To calculate p-values, the data are compared to distributions of the test statistics from ensembles of pseudoexperiments produced under the assumption of either β = 0 or β = 1. Each pseudo-experiment is produced by randomizing nuisance parameters according to the prior probability density functions (PDF) in Section XI. Additionally, a uniform PDF in the range [−π, +π] is used for δ CP , and a two-point PDF is used for the mass ordering with equal probability for normal and inverted ordering.
T2K data from four "control samples" (FHC singlering e-like and ν e CC1π-like and both neutrino and RHC single-ring µ-like) are used to constrain the oscillation and systematic model parameters. The impact of these four data control samples is estimated differently in the two analyses, A and C. In the latter, each of the 2 × 10 4 pseudo-experiments used to build the distributions of the rate-only and rate+shape test statistics is weighted by its likelihood over the four control samples, given the T2K data. Analysis A uses an alternative method, applying this constraint by using rejection sampling to select the pseudo-experiments that are most probable according to the data in the control samples.
Due to the presence of the four control samples, the marginal likelihood used to calculate ∆χ 2 , the rate+shape test statistic, is also constructed differently from the main oscillation analysis. For each pseudoexperiment, the marginal likelihood is constructed based on the prediction in the RHC single-ring e-like sample, while its background is constrained using the four control samples. This is reflected in Eq. 20, where L νe denotes the likelihood of the RHC single-ring e-like sample compared to the pseudo-experiment, E, L c is the product of the likelihoods of the four control samples compared to the T2K data, D, and f j the set of nuisance parameters. L νe β, f j ; E · L c β, f j ; D (20) The expected and observed test statistic distributions produced using analysis A are shown in Fig. 73 and the corresponding p-values are shown in Tab. XX. The hypothesis of no ν e appearance (β = 0) is excluded at the 1.9σ and 2.4σ levels, respectively using the rate-only and rate+shape analyses. The observed p-values provide a weaker exclusion of the β = 0 case than expected with Asimov data set, in both rate-only and rate+shape analyses. This is primarily due to observing fewer events (15) in the RHC single-ring e-like sample than expected (16.8), and strengthened by their relatively backgroundlike spectrum in the rate+shape analysis, as shown in Fig. 44 (d). Our data are consistent with the PMNS ν e appearance hypothesis (β = 1), with p-values of 0.32 and 0.30 for the rate-only and rate+shape analyses, respectively (corresponding to a 1σ exclusion). Analyses A and C both produce test statistic distributions and p-values that are in good agreement with each other, where minor differences between them were determined to result from the difference in kinematic variables used to bin the analysis templates.  It is also desirable to test the robustness of these results to various alternative choices of parts of the interaction model which have a non-negligible effect on the kinematic distributions of the RHC single-ring e-like sample. This is done using Analysis C by weighting the nominal near and far detector MC (see Sec. XII). Three simulated data sets are studied: the Kabirnezhad single pion production model, the Nieves LFG model and the data-driven CC 0π E ν − Q 2 simulated data, described in Sec. VI. To test their effects on the observed results of the ν e appearance analysis, the test statistic distributions are weighted by the ratio of the distribution produced using the alternative model to the expected distribution. Similarly, the observed test statistics are shifted by the difference between the median expected pseudoexperiment with and without the use of the alternative model. The change in the observed p-values from these alternative models are shown in Tabs. XXI and XXII for the rate and rate+shape analyses respectively. The observed changes are small compared to the nominal pvalue and the changes do not affect the conclusions, so the analyses are considered robust against alternative model choices.

XV. CONCLUSIONS
The T2K collaboration has analyzed the full data set collected by the experiment between 2010 and 2018 to produce measurements of sin 2 θ 23 , ∆m 2 32 , sin 2 θ 13 , δ CP and the mass ordering. The parameter values and uncertainties are taken from a simultaneous fit to both muon-like and electron-like event samples at Super-Kamiokande, collected from both neutrino-dominated and antineutrino-dominated beam operation. This analysis uses a new event selection at SK to increase the effective fiducial volume of the detector, resulting in a 20% increase in the electron-like sample efficiency and a 40% reduction in the background contamination in the muonlike samples. The neutrino interaction model used for this work is improved relative to Ref. [26], incorporating in-medium effects (RPA) and 2p2h shape uncertainties in the charged-current zero-pion signal channel. A detailed set of simulated data studies were also performed to assess the robustness of the analysis to alternative choices of neutrino interaction model. These studies demonstrated that the nucleon removal energy uncertainty can have a significant effect on the allowed regions for sin 2 θ 23 and ∆m 2 32 and, to a lesser extent, ∆m 2 32 was also sensitive to all of the alternative models studied. This led to the addition of new systematic uncertainties to the SK samples and oscillation contours to account for these effects. The results of these simulated data studies are an important outcome of this analysis: long-baseline experiments are now entering the precision era, and changes of a few percent in the reconstructed neutrino energy can have significant effects on the oscillation parameters observed by the experiment.
The T2K oscillation parameter measurements are, however, limited by statistics. T2K will collect more data in both neutrino and antineutrino beam operation mode. More complex event topologies will be included in the analyses at both ND280 and Super-Kamiokande, and the collaboration is working towards joint analyses with both the Super-Kamiokande and the NOνA experiments. The combination of new topologies and differing neutrino energies will lift the degeneracies between oscillation parameters present in a single experiment. This will enable the most sensitive measurements of neutrino oscillations to date, and will provide a template for future analyses at the next generation of long-baseline experiments. The data related to the measurement and results presented in this paper can be found in [101].  Tables XXIII and XXIV for the flux parameters, and  Table XXV for the cross section parameters.