Atmospheric neutrino oscillation analysis with external constraints in Super-Kamiokande I-IV

An analysis of atmospheric neutrino data from all four run periods of \superk optimized for sensitivity to the neutrino mass hierarchy is presented. Confidence intervals for $\Delta m^2_{32}$, $\sin^2 \theta_{23}$, $\sin^2 \theta_{13}$ and $\delta_{CP}$ are presented for normal neutrino mass hierarchy and inverted neutrino mass hierarchy hypotheses based on atmospheric neutrino data alone. Additional constraints from reactor data on $\theta_{13}$ and from published binned T2K data on muon neutrino disappearance and electron neutrino appearance are added to the atmospheric neutrino fit to give enhanced constraints on the above parameters. Over the range of parameters allowed at 90% confidence level, the normal mass hierarchy is favored by between 91.5% and 94.5% based on the combined result.


I. INTRODUCTION
The principal goal of contemporary neutrino oscillation experiments is to fully test the three-neutrino mixing paradigm based on the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix [1,2]. This paradigm is characterized by three mixing angles, two mass splittings, and one CP-violating phase. Some neutrino mixing parameters have been experimentally determined, such as the magnitude of the two mass splittings, the ordering of the mass states with the smallest splitting, and the values of the mixing angles. In particular, measurements by reactor antineutrino [3][4][5] experiments and T2K [6] have established that the mixing angle θ 13 is small but non-zero and they have precisely measured its value. There remain unknown parameters in the PMNS formalism, most notably the ordering of the mass states with the largest splitting, which is mathematically expressed as the sign of ∆m 2 31 , and is commonly referred to as the neutrino mass hierarchy. Although it is known that muon and tau neutrino mixing is nearly maximal, i.e. θ 23 is near π/4, it is not known if θ 23 takes exactly that value, or is slightly larger or slightly smaller [7,8]. With all three neutrino * Deceased.
† also at Department of Physics and Astronomy, UCLA, CA 90095-1547, USA. ‡ also at BMCC/CUNY, Science Department, New York, New York, USA.
flavors and mass states mixing, it is possible to measure the unknown CP-violating phase δ CP and perhaps conclude that neutrinos and antineutrinos have different oscillation probabilities, if it is found that δ CP is neither 0 nor π. The value of δ CP is considered to be unknown, although the T2K and NOvA long-baseline experiments, and the results published in this paper, are beginning to constrain it [8,9].
Due to the presence of neutrinos and antineutrinos, the effects of matter on neutrino oscillations, and the wide variety of energies and pathlengths spanned, atmospheric neutrinos are sensitive to the unknown parameters of the PMNS formalism. The measurement of the mass hierarchy is driven by an expected hierarchy-dependent, upward-going excess of either electron neutrino or antineutrino interactions driven by θ 13 -induced matter effects between 2 and 10 GeV. In order to take advantage of this phenomenon, sign selection of neutrino interactions and sufficient statistics are necessary. It should be noted that the size of this event excess is a function of θ 23 , and as will be discussed below, constraints on this parameter improve sensitivity to the hierarchy. Determining the mass hierarchy and measuring θ 23 play an important role in interpreting any neutrino versus antineutrino oscillation difference and thereby establishing CP violation.
In this paper we analyze 328 kiloton·years of Super-Kamiokande (Super-K,SK) atmospheric data. The sensitivity of our experiment is not sufficient to definitively resolve the unknown parameters. In particular we are limited by low statistics and difficult event classification in the high-energy hierarchy-sensitive sample. Nevertheless, we analyze the atmospheric neutrino data in a manner optimized for sensitivity to the mass hierarchy and report our best estimates and confidence intervals. We present results with and without constraints from external experiments. In Section II atmospheric neutrino oscillations are reviewed before discussing the Super-K detector and data set in Section III. An analysis of the atmospheric neutrino data by themselves is then presented in Section IV and followed by an analysis incorporating constraints from external measurements in Section V. These results are interpreted in Section VI before concluding in Section VII.

A. In Vacuum
Neutrinos oscillate because the neutrino eigenstates of the weak interaction are different from the neutrino mass eigenstates. The flavor eigenstates ν α are related to the mass eigenstates ν i by where U is the 3x3 Pontecorvo-Maki-Nakagawa-Sakata (PMNS) matrix [1,2]  Here c ij = cos θ ij , s ij = sin θ ij . Propagation of these states according to their vacuum Hamiltonians leads to the standard oscillation formula for relativistic neutrinos in vacuum where ∆ ij = 1.27∆m 2 ij (eV 2 )L(km) E(GeV) and the sign before the second summation is positive for neutrinos and negative for anti-neutrinos. Neutrino oscillations in vacuum are thus fully described by 6 parameters: the 3 mixing angles θ 13 , θ 12 , θ 23 , the two mass splittings ∆m 2 21 , ∆m 2 31 , and the CP-violating phase δ CP .

B. In Matter
For this analysis, exact 3-flavor oscillation probabilities with matter effects included are used. When neutrinos travel through matter, the effective Hamiltonian is modified from its vacuum form due to the difference in the forward scattering amplitudes of ν e and ν µ,τ (presented here in the mass eigenstate basis): where a = ± √ 2G F N e , G f is the Fermi constant, N e is the electron density, U is the PMNS matrix, and the plus (minus) sign is for neutrinos (antineutrinos). Following Ref. [11], the matrix X, whose row vectors are the propagated mass eigenvectors, can be written as: where the M 2 i /2E are the eigenvalues of the constantdensity matter Hamiltonian H matter , and I is the identity matrix. The oscillation probability can then be written as: Region Rmin (km) Rmax (km) density (g/cm 3 )  inner core  0  1220  13.0  outer core  1220  3480  11.3  mantle  3480  5701  5.0  crust  5701  6371  3.3   TABLE I. Model of the Earth used in the analysis, a simplified version of the PREM.
The eigenvalues M 2 i /2E have been found (with a since corrected sign error) as Equations 21 and 22 of Ref. [11].
In general, an atmospheric neutrino can pass through various densities of matter on its way to our detector. The Earth's atmosphere is modeled as vacuum, and the Earth as a sphere of radius 6371 km, with a spherical density profile which is a simplified version of the preliminary reference Earth model (PREM) [12], as shown in Table I.
The use of the full PREM model with 82 layers provides no perceptible change in the sensitivity of the Super-Kamiokande analysis, so the simplified matter profile is adopted to reduce computation times. To calculate the oscillation probability of a neutrino with energy E produced at a height h above the surface of the Earth, the path from our detector to the neutrino production location is traced through N steps across the atmosphere and different regions of the Earth's interior ( Figure 1). Note that because the Earth is modeled as spherically symmetric, this path is a function of only the production height and zenith angle; it is independent of azimuthal angle. The oscillation probability for a given neutrino is calculated by stepping along its path: where L i and ρ i are the length and density of the i th step. Figure 2 shows the ν µ survival and ν µ → ν e transition probabilities for neutrinos and antineutrinos assuming the normal mass hierarchy. Resonant enhancement of the ν µ → ν e transition is clear for upward-going neutrinos of several GeV. If the inverted hierarchy were assumed this resonance would occur for antineutrinos instead.

III. THE SUPER-KAMIOKANDE DETECTOR
Super-Kamiokande is a cylindrical 50-kiloton water Cherenkov detector, located inside the Kamioka mine in Gifu, Japan. An inner detector (ID) volume is viewed by more than 11,000 inward-facing 20-inch photomultiplier tubes (PMTs) and contains a 32-kiloton target volume. The outer detector, which is defined by the two meter-thick cylindrical shell surrounding the ID, is lined with reflective Tyvek to increase light collection to 1,885 outward-facing eight-inch PMTs mounted on the shell's inner surface. Since the start of operations in 1996, The propagation of two neutrinos through the simplified model of the Earth used in the analysis below. Both νA and νB are produced in the atmosphere. νA then experiences 6 oscillation steps (air → crust → mantle → outer core → mantle → crust), while νB experiences 4 oscillation steps (air → crust → mantle → crust).
Super-Kamiokande has gone through four data taking periods, SK-I, -II, -III, and -IV. Though the basic configuration the detector is similar across the phases, following an accident in 2001, the SK-II detector was operated with the half the PMTs (20% photocathode coverage) of the other periods (40% coverage). Since that time all ID PMTs have been encased in fiber reinforced plastic shells.
Neutrino interactions which produce charged particles above the Cherenkov threshold in water are reconstructed based on the observed ring patterns projected on the detector walls. Photomultiplier timing information is used to reconstruct the initial interaction vertex after correcting for the photon time of flight. Particles are divided into two broad categories based upon their Cherenkov ring pattern and opening angle. Rings from particles which produce electromagnetic showers, such as electrons and photons, tend to have rough edges due to the many overlapping rings from particles in the shower and are labeled e-like or showering. Muons and charged pions on the other hand, which do not form showers, produce Cherenkov rings with crisp edges. Such rings are labeled µ-like or non-showering. The event reconstruction assigns momenta to each reconstructed ring in an event based on the observed number of photons in the ring. Particles with higher momenta produce brighter Cherenkov rings. Similarly, particle directions are inferred based on the shape of their ring pattern. Since the neutrino itself is unobserved, energy and direction variables for use in the oscillation analysis described below are based on the properties of their daughter particles.
Oscillation probabilities for neutrinos (upper panels) and antineutrinos (lower panels) as a function of energy and zenith angle assuming a normal mass hierarchy. Matter effects in the Earth produce the distortions in the neutrino figures between 2 and 10 GeV, which are not present in the antineutrino figures. For an inverted hierarchy the matter effects appear in the antineutrino figures. Here the oscillation parameters are taken to be ∆m 2 32 = 2.5×10 −3 eV 2 , sin 2 θ23 = 0.5, sin 2 θ13 = 0.0219, and δCP = 0.
At the start of the SK-IV period in 2008 the front-end electronics were upgraded to a new system with an ASIC based on a high-speed charge-to-time converter [13]. The new system allows for the loss-less data acquisition of all PMT hits above threshold and has improved the tagging efficiency of delayed Michel electrons from muon decay from 73% in SK-III to 88%. More detailed descriptions of the detector and its electronics can be found in [13][14][15].

A. Detector Calibration
Over the 20 year history of the experiment changes in the run conditions have been unavoidable. Seasonal changes in precipitation and the expansion of underground activities at the Kamioka site have variable impact on the quality and quantity of underground water available to fill the detector and maintain its temper-  ature. These changes impact the water transparency and subsequent performance of the detector and therefore must be corrected through calibrations. Since neutrino oscillations are a function of the neutrino energy, a thorough understanding of the detector energy scale is important for precision measurements.
At the same time the range of energies of interest to atmospheric neutrino analysis spans from tens of MeV to tens of TeV, eliminating the possibility of calibration through radioactive isotopes. Accordingly, the energy scale is calibrated using natural sidebands covering a variety of energies. Neutral pions reconstructed from atmospheric neutrino interactions provide a calibration point via the π 0 momentum and stopping cosmic ray muons of various momenta are used to measure photoelectron production as a function of muon track length (Cherenkov angle) for multi-GeV (sub-GeV) energies. Here the muon track length is estimated using the distance between the entering vertex and the position of the electron produced in its subsequent decay. The energy spectrum of these Michel electrons additionally serves as a low energy calibration point. Figure 3 shows the absolute energy scale measurement using each of these samples.
In the oscillation analysis the absolute energy scale uncertainty is conservatively taken to be the value of the most discrepant sample from this study in each run period. The total systematic error is assigned taking this value summed in quadrature with the time variation of the energy scale, which is measured using the variation in the average reconstructed momentum of Michel electrons and the variation in the stopping muon momentum divided by range. An example of the latter showing the energy scale stability since SK-I appears in Figure 4. Note the SK-III period was subject to poor and volatile water transparency conditions, resulting in a comparatively turbulent energy scale. The stability seen in the SK-IV period is a result of improvements in the water purification system and in corrections for the time variation of the PMT response. The total energy scale uncertainty in each period has been estimated as 3.3% in SK-I, 2.8% in SK-II, 2.4% in SK-III, and 2.1% in SK-IV. Energy scale stability measured as a function of date since the start of SK operations. The energy scale is taken as the average of the reconstructed momentum divided by range of stopping cosmic ray muon data in each bin. The vertical axis shows the deviation of this parameter from the mean value for each SK period separately. Error bars are statistical.

B. Sample Selection
The current analysis utilizes atmospheric neutrino data collected during each of the SK run periods and corresponds to a total livetime of 5,326 days, 2,519 of which are from SK-IV. Super-Kamiokande's atmospheric neutrino data are separated into three broad categories, fully contained (FC), partially contained (PC) and upwardgoing muons (Up-µ) that are further sub-divided into the final analysis samples. Fully contained events have a reconstructed vertex within the 22.5 kton fiducial volume, defined as the region located more than 2 m from the ID wall, and with no activity in the OD. The FC data are sub-divided based upon the number of observed Cherenkov rings, the particle ID (PID) of the most energetic ring, and visible energy or momentum into combinations of single-or multi-ring, electron-like (e-like) or muon-like (µ-like), and sub-GeV (E < 1330.0 MeV) or multi-GeV (E > 1330.0 MeV). Additional selections are made based on the number of observed electrons from muon decays and the likelihood of containing a π 0 . For the SK-I, -II, and -III data periods the latter selection is based on [16] and for SK-IV it is performed using the improved algorithm presented in [6]. After all selections there are a total of 14 FC analysis samples. Events with a fiducial vertex but with energy deposition in the OD are classified as PC. Based on the energy deposition within the OD, PC events are further classified into "stopping" and "through-going" subsamples.
The Up-µ sample is composed of upward-going muon events produced by neutrino interactions in the rock surrounding SK or in the OD water. Accordingly, light deposition in both the OD and ID is expected and the sample is divided into "through-going" and "stopping" subsamples for events that cross or stop within the ID, respectively. Through-going events with energy deposition consistent with radiative losses are separated into a "showering" subsample. The 19 analysis samples defined for each of the SK run periods are summarized in Table II. Zenith angle distributions of each sample are shown in Figure 5. Distributions of the true neutrino energy for the FC, PC, and Up-µ event categories appear in Figure 6. Their event rates over the lifetime of the experiment have been stable at 8.3 FC events per day, 0.73 PC events per day, and 1.49 Up-µ events per day, as shown in Figure 7.
As outlined in Section II the primary handle for distinguishing the normal from the inverted mass hierarchy is whether neutrinos or antineutrinos undergo resonant  Table II) are shown as zenith angle distributions (second through fifth column) and other samples are shown as reconstructed momentum distributions (first column). Cyan (orange) lines denote the best fit MC assuming the normal (inverted) hierarchy. Narrow panels below each distribution show the ratio relative to the normal hierarchy MC. In all panels the error bars represent the statistical uncertainty.
oscillations as they traverse the earth. The effect of resonant oscillations would manifest most prominently as an excess of upward-going e-like events at O(GeV) energies driven by ν µ → ν e oscillations, so extracting the mass hierarchy requires separation of ν e fromν e interactions. As the SK detector is insensitive to the charge sign of particles traversing the detector, charged-current (CC) neutrino interactions and antineutrino interactions cannot be differentiated on an event-by-event basis. Instead this separation is done statistically. It should be noted that due to their larger cross section and higher flux, more than twice as many neutrino interactions are expected in the data. Further, while hierarchy-sensitive matter effects are also present in the ν µ → ν µ channel, attempts to similarly separate the µ-like data yielded no significant change in sensitivity and are not considered here.
Between two and ten GeV, in addition to charged-current quasi-elastic interactions, single-pion (1π) production via ∆ resonance excitation and deep inelastic scattering (DIS) processes are significant. In the case of the former, the outgoing π − in antineutrino reactions, such asν e + n → e + nπ − , will often capture on a 16 O nucleus leaving the positron as the only Cherenkov lightemitting particle. Neutrino interactions, on the other hand, are accompanied by a π + , such as in ν e + n → e − nπ + , where the π + does not capture in this manner and can therefore survive long enough to produce a delayed electron through its decay chain. For CCν e interactions in which the π − has captured there will be no such decay electrons. Accordingly, an antineutrino enriched subsample is extracted from the single-ring multi-GeV e-like sample by additionally requiring there are no decay electrons present. This cut defines the single-ring multi-GeVν e -like sample and its rejected events form the single-ring multi-GeV ν e -like sample. After this se- lection the fractions of charged-current electron neutrino and antineutrino events in the ν e -like sample are 62.1% and 9.0%, respectively. For theν e -like sample the fractions are 54.6% and 37.2%.
At these energies, events with more than one reconstructed ring are often DIS interactions, which produce both multiple charged pions and nuclear fragments. In order to purify the neutrino and antineutrino components of the multi-ring samples a two-stage likelihood method has been developed. Due to the presence of several lightproducing particles the Cherenkov ring produced by the leading lepton is often obscured, resulting in degraded PID performance and accordingly, significant NC and ν µ -induced backgrounds in multi-ring events whose most energetic ring is e-like. The first stage of the separation is designed to extract and purify CC ν e +ν e interactions from this base sample. To perform this selection a likelihood function, detailed in a previous publication [16], is built from the PID variable of the event's most energetic ring, the fraction of the event's total momentum it carries, the number of decay electrons, and the largest distance to a decay electron vertex from the primary event vertex. The efficiency of this method for selecting true CC ν e +ν e events is 72.7% and results in a sample that is 73.0% pure in these interactions. Separate likelihoods are prepared for each of the run periods and yield similar efficiencies and purities. Events that pass this selection are classified as "multi-ring e-like" while those that fail are termed "multi-ring other." Though the multi-ring other sample has not been used in previous Super-K oscillation analyses it is introduced here since its ν e component offers some hierarchy sensitivity and its oscillation-induced ν τ and NC components provide additional constraints on related systematic uncertainties.
The second stage of the separation process focuses on separating samples enriched in neutrino and antineutrino interactions from the multi-ring e-like data. A second likelihood method is introduced based on three variables, the number of reconstructed rings, the number of decay electrons, and the event's transverse momentum. For charged-current interactions the conservation of charge implies the total charge of the recoiling hadronic system must be positive to balance the negative charge of the out-going lepton. The total charge carried by hadrons emerging from antineutrino interactions, on the other hand, will be zero or negative. As a result, the charged pion multiplicity, and hence number of visible Cherenkov rings, in neutrino-induced events is expected to exceed that from antineutrino events. This difference is enhanced by the propensity for π − to capture in water. In combination these two effects suggest that more electrons from the π decay chain are expected in ν interactions. Due to the V-A structure of the weak interaction, the angular distribution of the leading lepton fromν interactions is more forward than those from ν processes. As a result, the transverse momentum of the system is expected to be smaller for the former. Since there is no direct knowledge of an incoming atmospheric neutrino's direction the transverse momentum of each event is defined relative to the direction of the most energetic ring. The final likelihood is defined over five visible energy bins, 1.33-2.5 GeV , 2.5-5.0 GeV, 5.0-10.0 GeV, 10.0-20 GeV and > 20 GeV for each SK run period. Figure 8 shows the combined likelihood distribution used in SK-IV. The efficiency for identifying true CCν e (ν e ) events asν e -like is 71.5% (47.1%).

C. Simulation
The simulation of atmospheric neutrinos is performed following the flux calculation of Honda et. al [17] and using the NEUT [18] simulation software (version 5.3.6) to generate neutrino interactions for tracking in a GEANT3 [19]-based simulation of the Super-K detector [15]. Several improvements to NEUT have been made since the previous version used for atmospheric neutrino analysis (c.f. [20]). Charged-current quasielastic interactions are simulated using the Llewellyn-Smith model [21] with nucleons distributed according to the Smith-Moniz relativistic Fermi gas [22] assuming an axial mass M A = 1.21GeV/c 2 and form factors from [23]. Interactions on correlated pairs of nucleons, so-called meson exchange currents (MEC), have been included following the model of Nieves [24]. Pion-production processes are simulated using the Rhein-Seghal model [25]  with Graczyk form factors [26]. Since the MEC simulation includes delta absorption processes, the pionless ∆ decay process, ∆ + N → N + N , in NEUT's previous pion production model has been removed in the present version.

IV. ATMOSPHERIC NEUTRINO ANALYSIS
Three fits, each incorporating a different degree of external information, are performed to estimate oscillation parameters. In the first and least-constrained fit, the Super-K atmospheric neutrino data are fit allowing θ 13 to vary as a free parameter. The second fit similarly uses only atmospheric neutrino data, but assumes θ 13 to be the average of several reactor neutrino disappearance experiment measurements, sin 2 θ 13 = 0.0219 ± 0.0012 [27]. Finally, the T2K samples discussed in section V are fit alongside the atmospheric neutrino data under the same assumption. In each of these fits the data are fit against both the normal and inverted hierarchy hypotheses. Data are fit to the MC using a binned χ 2 method built assuming Poisson statistics and incorporating systematic errors as scaling factors on the MC in each bin [28]: where, In this equation E n,j represents the MC expectation in the n th analysis bin for the j th SK period. Similarly, O n,j is the corresponding data in that bin and f i n,j is a coefficient describing the fractional change in the bin's MC under a 1σ i variation of the i th systematic error source. Systematic errors penalize the χ 2 based on their corresponding fitting parameters, i . Solving the system of equations defined by the requirement ∂χ 2 /∂ i = 0 for each systematic error brings the data and MC into the best agreement allowed by the systematic errors. This minimization in the systematic error parameters is repeated over a grid of oscillation parameters and the parameter set returning the smallest value of χ 2 is taken as the best fit.
The fit is performed over 520 analysis bins for each of the SK periods and a total of 155 systematic error sources. In addition, a systematic error on the presence of meson exchange currents has been added to the analysis where the difference between the NEUT model with and without MEC is taken as the 1σ uncertainty. Further, the single pion production error of previous analyses has been broken down into three parts following the parameterization of Ref. [26]. Systematic errors and their sizes at the best fit point of the analysis are presented in Tables VII, VIII, and IX. When the atmospheric data are studied without external constraints the fit is performed over four parameters.  Table III but their uncertainties are treated as a source of systematic error in the analysis. For the normal (inverted) hierarchy fit the fitting parameter is ∆m 2 32 (∆m 2 31 ). Independent fits are performed for the normal and inverted hierarchies and the grid point returning the smallest value of χ 2 is termed the best fit for each. The smallest of these is taken as the global best fit.
Further, the compatibility of the atmospheric neutrino data with oscillations subject to matter effects in the Earth is evaluated by performing the same fits with sin 2 θ 13 constrained to 0.0219 ± 0.0012 (discussed below) and introducing an additional scaling parameter on the electron density in Equation 5, α. This parameter is allowed to range in 20 steps from 0.0 to 1.9, with α = 1.0 corresponding to the standard electron density in the Earth.
Results and Discussion Figure 9 shows one-dimensional allowed regions for |∆m 2 32,31 |, sin 2 θ 23 , θ 13 and δ CP . In each plot the curve is drawn such that the χ 2 for each point on the horizontal axis is the smallest value among all parameter sets including that point. When the atmospheric neutrino data are fit by themselves with no constraint on θ 13 , the normal hierarchy hypothesis yields better data-MC agreement than the inverted hierarchy hypothesis with χ 2 N H,min − χ 2 IH,min = −3.89. The preferred value of sin 2 θ 13 is 0.019(0.008) assuming the former (latter). Though both differ from the globally preferred value of 0.0219 the constraints are weak and include this value at the 1σ level. In the normal hierarchy fit the point at sin 2 θ 13 = 0.0 is disfavored at approximately 2σ indicating the data have a weak preference for non-zero values. A summary of the best fit information and parameter constraints is presented in Table V.
The data's preference for both non-zero sin 2 θ 13 and the normal mass hierarchy suggest the presence of upwardgoing electron neutrino appearance at multi-GeV energies driven by matter effects in the Earth (c.f. Figure 2). Figure 10 shows the up-down asymmetry of the multi-GeV single-and multi-ring electron-like analysis samples.
Here the asymmetry is defined as are the number of events whose zenith angle satisfy cosθ z < −0.4 (cosθ z > 0.4). Excesses seen between a few and ten GeV, in particular in the Multi-GeV e-like ν e and the Multi-Ring e-like ν e andν e samples, drive these preferences.
The normal hierarchy fits to the atmospheric mixing parameters yield ∆m 2 32 = 2.47 +0.17 −0.28 × 10 −3 eV 2 and sin 2 θ 23 = 0.584 +0.039 −0.069 . However, the Super-K data show a weak preference for the second octant of θ 23 , disfavoring maximal mixing (sin 2 θ 23 = 0.5) at around 1σ significance. This preference is driven by data excesses (deficits) at multi-GeV energies in the upward-going regions of the single-ring e-like ν e (µ-like) and multi-ring other samples. These features are consistent with expectations from ν µ → ν e oscillations driven by non-zero θ 13 .
The best fit value of δ CP is found to be 4.18 (3.85) radians in the normal (inverted) fit, with the least preferred parameter value near 0.8 radians disfavored by ∆χ 2 = 2.7 (1.0). This preference is driven predominantly by the sub-GeV e-like samples, via ν µ → ν e oscillations. Though the effect of this parameter at these energies is a complicated function of both energy and the neutrino path length, the point at 4.18 radians generally induces more electron neutrino appearance in the sub-GeV e-like samples. At higher energies the effect of δ CP modulates the θ 13 -driven ν µ → ν e probability in the resonance region, but is secondary in size and induces more (less) appearance at 4.18 (0.8) radians. As there are fewer antineutrino events relative to neutrino events in the atmospheric sample there is accordingly more freedom to adjust θ 13 to bring the MC prediction into agreement with data in the inverted hierarchy fit. As a result a weaker constraint on δ CP is obtained.
The consistency of these data with the presence of mat- For the single-ring samples the energy is taken to be the visible energy assuming the light-producing particle was an electron. For the multi-ring samples the total energy is used after accounting for the particle type (electron or muon) of each reconstructed ring. The cyan line denotes the best fit from the normal hierarchy hypothesis, and the orange line the best fit from the inverted hierarchy hypothesis.

Up -Down / Up + Down
ter effects is illustrated in Figure 11. With sin 2 θ 13 set to 0.0219±0.0012, the data prefer the normal hierarchy with an electron density consistent with that of standard matter (α = 1.0). Purely vacuum oscillations, represented by α = 0.0, are disfavored by the fit by χ 2 α=0 − χ 2 min = 5.2 after accounting for the hierarchy uncertainty. Based on toy Monte Carlo studies, this corresponds to a significance of excluding vacuum oscillations at 1.6σ.

V. ATMOSPHERIC NEUTRINOS WITH EXTERNAL CONSTRAINTS
Though the atmospheric neutrino data are sensitive to the values of θ 13 , θ 23 , and |∆m 2 32 |, the size of the mass hierarchy signal is a function of these parameters. As such, larger uncertainties translate directly into reduced hierarchy sensitivity. Indeed, toy MC data sets which were generated with a particular hierarchy but were best fit to the alternative hierarchy often preferred values of the atmospheric mixing parameters different from the input values. For example, a true normal hierarchy MC generated with θ 23 in the lower octant can be reasonably fit by the inverted hierarchy hypothesis and the second octant of this parameter. Since there is relatively Constraints on the matter effect parameter α from the Super-K atmospheric neutrino data fit assuming sin 2 θ13 = 0.0219 ± 0.0012 . Orange lines denote the inverted hierarchy result, which has been offset from the normal hierarchy result, shown in blue, by the difference in their minimum χ 2 values. Vacuum corresponds to α = 0, while the standard matter profile used in the rest of the analyses presented here corresponds to α = 1.
poor separation between neutrino and antineutrino interactions, the expected increase in the event rates in both scenarios is roughly equal. Restricting the allowed regions of the atmospheric mixing parameters therefore provides increased hierarchy sensitivity by effectively removing such degenerate combinations. The constraints adopted in the present analysis are based exclusively on information available in the literature and are described below.

A. Reactor Constraint on θ13
Currently the most precise measurements of sin 2 2θ 13 come from the Daya Bay, RENO, and Double Chooz experiments [3][4][5]. In the analysis described below the central value of this parameter is taken to be sin 2 θ 13 = 0.0219 ± 0.0012 based on the average of these measurements presented in [27]. A systematic error representing the size of the uncertainty from this average is incorporated in the analysis.

B. Constraints from T2K
The T2K (Tokai-to-Kamioka) long-baseline neutrino experiment sends a beam composed primarily of ν µ from Tokai-village, Japan, 2.5 • off-axis toward the Super-Kamiokande detector 295 km away. A complex of detectors (the near detectors) located 280 m downstream of the neutrino production point and at the same off-axis angle is used to measure the unoscillated beam spectrum and to thereby constrain the expected spectrum at Super-K (the far detector). A sharp beam profile peaking at 600 MeV is expected at the far detector and provides for sensitive measurements of θ 23 and ∆m 2 32 . Currently T2K's measurements [8,29] of these parameters are more constraining than the Super-K atmospheric neutrino measurement and provide a statisticallyindependent constraint. These together with inherent correlations in some systematic error sources, such as the detector response and cross section model, make T2K a powerful input to the Super-K hierarchy analysis. A more detailed description of the T2K experiment is presented elsewhere [30].  [31]. Rates correspond to the number of interactions per 1.0 × 10 21 protons on target.
Since Super-K serves as the far detector for T2K many aspects of the experiments are shared. Notably the detector simulation as well as the neutrino interaction generator, NEUT [32], and the event reconstruction tools at Super-K are common between the two. From the standpoint of Super-K then, only the neutrino source and associated systematics differ between the beam and atmospheric neutrino measurements. For this reason it is possible to create a reliable simulation of the T2K experiment using software and methods specific to atmospheric neutrino measurements, provided only information about the beam flux and systematic errors. Accordingly, in addition to the 19×4 data samples presented in Section III, simulated T2K ν e appearance and ν µ disappearance samples are introduced into the atmospheric analysis in order to directly incorporate T2K's measurements. Monte Carlo corresponding to these samples is constructed from reweighted atmospheric neutrino MC and data are taken from the literature. This scheme allows various oscillation hypotheses to be tested against the published T2K data and in conjunction with the Super-K data. Provided the model samples reproduce T2K's results when fit without the atmospheric neutrino data, the results of a combined analysis can be taken as reliable. Neutrino MC samples at Super-K are generated according to the Honda 2011 flux calculation [17] and a sample equivalent to a 500 year exposure of the SK-IV detector, the run period which contains the T2K beam data, is reweighted according to the beam flux prediction presented in [31]. Detailed predictions assuming no oscillations are available for the ν µ ,ν µ , ν e , andν e components of both the beam and atmospheric fluxes at Super-K. Atmospheric neutrino interactions are reweighted according to neutrino flavor, arrival direction, and energy to match the beam spectrum. Though the T2K beam enters the Super-K tank from one direction and atmospheric neutrinos enter from all directions, the uniformity of the detector's response is such that this reweighting results in negligible biases in the model samples. Both T2K analysis samples considered here are fully contained interactions based on the same fiducial volume as the atmospheric neutrino sample. The normalization of the reweighted MC (hereafter beam MC) is computed based on the total neutrino interaction cross section on 22.5 kton of water convolved with the beam flux. Table IV lists the interaction rate for 1.0 × 10 21 protons on the T2K target for several combinations of neutrino flux and cross section.
Separate T2K e-like and µ-like samples are constructed from the beam MC using the selection criteria presented in [6] and [33], respectively. Both samples are com-posed of fully contained fiducial volume events with more than 30 MeV of visible energy and a single reconstructed Cherenkov ring. To be included in the e-like sample the PID of the ring is required to be e-like and must have more than 100 MeV of visible energy. Additionally, there must not be any activity consistent with the electron from a decayed muon and the reconstructed neutrino energy (described below) must be less than 1250 MeV. A final cut designed to reduce backgrounds from NC π 0 interactions is applied according to [6]. Events whose Cherenkov ring has µ-like PID with a momentum greater than 200 MeV/c and at most one decay electron comprise the µ-like sample.
During the analysis, both samples are binned using the reconstructed neutrino energy calculated assuming charged-current quasi-elastic (CCQE) interactions in water: Here M n (M p ) is the neutron (proton) mass and V nuc is the average nucleon binding energy in 16 O, 27 MeV. The charged lepton mass, m l , is assumed to be that of an electron for the e-like sample and that of a muon for the µ-like sample. Similarly, the total energy, E l , is computed for each sample using the corresponding m l and the reconstructed momentum, P l . The cos θ term represents the opening angle between the neutrino and lepton directions, which is computed using MC truth information for the parent neutrino and the reconstructed direction of the charged lepton ring. Though the official T2K analyses use maximum likelihood methods, without detailed information of each data event, reproducing the analyses exactly using only published information is infeasible. Instead the data are binned as specified in the T2K publications. The e-like sample uses 50 MeV wide bins evenly spaced from 100 to 1250 MeV and the µ-like sample uses 50 MeV bins from 0.2 to 3.0 GeV, 100 MeV wide bins from 3.0 to 5.0 GeV, and a single bin for more energetic events. A critical component of the T2K analysis is the constraint coming from measurements of the unoscillated neutrino flux and interactions at its near detector complex. Measurements of the CC ν µ interaction rate adjust the central values and uncertainties on parameters describing the flux and cross section models underlying the simulation at Super-K. Incorporation of these constraints alters the shape and composition of the expected spectrum at Super-K and is therefore essential for an accurate reproduction of the T2K results. Energy dependent normalization parameters for the beam's ν µ ,ν µ , ν e , andν e flux components from [29] are applied as additional weighting factors for the beam MC. Constraints on the interaction model, such as the value of ax-ial mass parameters for quasi-elastic processes and pion production interactions via the ∆ resonance, as well as the CCQE, CC single pion, and NC π 0 cross section normalizations are similarly incorporated as multiplicative weighting factors. For example, the T2K-measured change in the CCQE axial mass parameter, M QE A from the default value of 1.21 ± 0.45 to 1.33 ± 0.20 is incorporated into the present analysis by computing the ratio of the CCQE cross section for each MC event based on its generated lepton and hadron kinematics. Errors assigned to the flux and cross section parameters in [29] are used in the construction of systematic error response coefficients discussed below. It should be noted however, that the complete spectral response of the T2K error model is not publicly available, and the influence of systematic errors is often expressed as the expected change in each sample's event rate. In these cases the error model used in the atmospheric neutrino analysis is adapted to produce the same event rate change in the T2K samples. In the combined analysis of atmospheric data and the T2K model, detector and cross section systematic errors are considered completely correlated between the two data sets, while the flux errors are uncorrelated.
The model constructed here is based on 6.57 × 10 20 protons on target taken with T2K's neutrino-enhanced beam. Though antineutrino data and contours are available in the literature (c.f. [34]), the statistics are too low to impact the sensitivity of the present analysis and are not included in the model. Figure 12 shows a comparison of the model with T2K's constraints on δ CP and the mass hierarchy after removing (profiling out) the effect of other oscillation parameters. The expected impact of the T2K model on the atmospheric neutrino sensitivity to the mass hierarchy is illustrated in Figure 13. For all assumed values of sin 2 θ 23 , the T2K model's constraint on the atmospheric mixing parameters strengthens the sensitivity.
It should be noted that other long-baseline neutrino experiments have made precision measurements of atmospheric mixing parameters, which, when adapted as external constraints in this analysis, could improve the expected sensitivity in the same manner as T2K. For example, as seen in Fig. 15, MINOS [35] constrains ∆m 2 32 roughly as precisely as T2K, although T2K constrains sin 2 θ 23 better. Moreover, the neutrino interactions in MINOS are on iron nuclei, not water, introducing an uncancelled systematic uncertainty. Measurements by NOvA [36,37] of muon neutrino disappearance and electron neutrino appearance should benefit the present analysis; their inclusion is anticipated in a future effort.

Analysis
After the introduction of external constraints the atmospheric neutrino data are analyzed in two ways using modified versions of the fitting scheme outlined in Section IV. In the first analysis the same atmospheric neutrino data samples and binning are fit over a restricted parameter space, with sin 2 θ 13 constrained to 0.0219 as described above and other parameter ranges unchanged. An additional systematic error parameter representing the effect of the uncertainty in external measurements of θ 13 on the SK analysis samples is included in the fit.
The second analysis imposes the same constraint but introduces additional analysis bins and systematic errors to accommodate the T2K analysis samples described above. Using this model of the T2K samples the analysis is performed over the same oscillation parameter grid and does not rely on knowledge of T2K's published likelihood surface. Systematic error parameters for the T2K samples are fit simultaneously with those for the atmospheric neutrino samples.

Results and Discussion
Constraints on the atmospheric neutrino mixing parameters and δ CP in the θ 13 -constrained fit without the T2K samples are shown in Figure 14. As in the unconstrained fit the data prefer the normal hierarchy over the inverted hierarchy with ∆χ 2 ≡ χ 2 N H,min − χ 2 IH,min = −4.34. While the best fit value of |∆m 2 32 | has shifted slightly, it is within errors of the unconstrained fit and in good agreement with other measurements (c.f. Figure 15). Similarly, the preference for the second octant of θ 23 remains unchanged and no significant change is seen in the width of the parameter's allowed region at 1σ. The best fit value of δ CP is 4.19 for both hierarchies, with a tighter constraint on other values relative to the unconstrained fit. Parameter values and their 1σ errors are summarized in Table V. In the second fit the addition of the T2K samples is expected to improve the constraint on the atmospheric mixing parameters due to T2K's more precise measurements. The left two panels of Figure 16 show onedimensional constraints on these parameters and twodimensional contours appear in Figure 17. In the latter dotted lines denote the allowed region from the θ 13constrained fit to the atmospheric neutrino data only and dashed lines show the allowed regions from the T2K model fit by itself. The combination of the two data sets, depicted as the solid line, shows that the fit to these parameters is dominated by the T2K model, with little improvement seen in the contour when fit together with atmospheric neutrinos.
With less freedom to adjust the atmospheric mixing parameters, the combination of atmospheric neutrinos with the T2K model is expected to improve the mass hierarchy sensitivity on average (see Figure 13). By itself, the T2K model favors the normal hierarchy by ∆χ 2 = −0.85 [29]. When combined with atmospheric neutrinos, the hierarchy preference strengthens to ∆χ 2 = −5.19.
Similar preferences in both samples for δ CP near 3π/2 result in a stronger constraint on this parameter when analyzed together. The right panel of Figure 16 shows the constraint for both hierarchy assumptions, with the offset in the two lines corresponding to the ∆χ 2 between the two. Naturally, this preference is consistent with an increased ν e (as opposed toν e ) rate in T2K relative to the expectation from the measured value of θ 13 . Though the constraint from the normal hierarchy fit disfavors the region around π/2, the contour includes the CP-conserving The Super-K contour (cyan) is taken from the analysis with sin 2 θ13 assumed to be 0.0219 ± 0.0012. Contours from the T2K (violet) [8], NOvA (dashed green) [7], MINOS+ (dashed blue) [35], and IceCube (red) [38] experiments are also shown.

VI. INTERPRETATION
It is known that the significance of a mass hierarchy determination does not necessarily follow the expectation from a comparison of the χ 2 minima from each of the hierarchy hypotheses (c.f. reference [39]). Indeed, the hierarchies do not form a nested hypothesis and as a result Wilks' theorem [40] is not applicable. To address the issue of the hierarchy significance in the present analysis, ensembles of pseudo data sets generated from the atmospheric neutrino MC are used to estimate p-values for obtaining a difference in χ 2 between the hierarchy hypotheses more extreme than that observed in data. This condition is termed "rejecting" the alternative hierarchy hypothesis for a given hierarchy assumption in what follows.
For the Super-K analysis, two important issues need to be considered. First, as shown in Figure 13 the expected sensitivity to the mass hierarchy is a strong function of the underlying oscillation parameters and as such, pvalue calculations are expected to depend heavily on the parameters assumed in the generation of MC ensembles. Rather than attempting a Bayesian-like treatment of the p-value calculation and marginalizing over the effect of each parameter, a range of p-values has been computed using the 90% C.L. intervals obtained from the present analysis to avoid ambiguities surrounding the choice of parameter priors. Second, it is also clear from the figure that at the current level of statistics, Super-K has only modest sensitivity to reject either hypothesis, making the interpretation of the p-value susceptible to fluctuations of the background. While the p-value for rejecting the inverted hierarchy (IH) hypothesis assuming the normal hierarchy (NH) may be unlikely, the p-value in the reverse scenario may be equally unlikely, leading to an overestimation of the significance when stated in terms of the first p-value only. Following the lead of the LHC experiments, this issue is treated using the CL s method [41], where Here p 0 (IH) (p 0 (N H)) represents the p-value for obtaining a difference in the minimum χ 2 values between both hierarchy hypotheses, ∆χ 2 ≡ χ 2 N H −χ 2 IH smaller (larger) than that from the data, ∆χ 2 data , assuming the true hierarchy is the IH (NH). While CL s does not behave as a fully frequentist p-value, it is a conservative method of preventing erroneous rejection of the null hypothesis when the overall sensitivity is limited.
MC ensembles were generated assuming statistical fluctuations of the pseudo data sets according to the current detector exposure, and Gaussian fluctuations of the systematic errors. Figure 18 shows the distribution of MC ensembles used in the calculation of the CL s value for the SK θ 13 -constrained fit. Table VI shows the range of p-values and CL s values based on ensembles generated with true oscillation parameters taken from the 90% C.L.  Here NH (IH) refers to the normal (inverted) hierarchy fit. The terms "Free" and "Constrained" refer to fits without and with a constraint on sin 2 θ13, respectively, as described in the text. The expected absolute χ 2 value for the SK (SK+T2K) fits is 559.9 (636.2). The p-value for obtaining a smaller χ 2 than the data is 0.437 (0.482) in the NH θ13-constrained fits.  18. Distributions of the difference in best fit χ 2 values between normal-and inverted-hierarchy fits to pseudo data sets used in the generation of the CLs value for the SK θ13 constrained analysis. In the cyan (orange) histogram the pseudo data have been generated assuming the normal (inverted) hierarchy at the analysis best fit shown in Table V. Shaded portions of the histograms denote the fraction of pseudo data sets with more extreme values than that observed in the data, ∆χ 2 data = −4.34.
bounds on θ 23 and δ CP and best fits from the analyses above. Since the data's preference for the normal hierarchy is driven primarily by upward-going excesses seen in hierarchy-sensitive e-like samples, smaller values of p 0 (IH) and larger CL s are obtained when assuming smaller values of sin 2 θ 23 or when δ CP is near π/2 since both of these regions predict the least amount of electron neutrino appearance. For sin 2 θ 23 > 0.60 both metrics decrease as there is sufficient electron neutrino appearance to discriminate between the two hierarchy hypotheses at the level seen in the data. In contrast, both metrics are found to vary only slightly with ∆m 2 32,31 .

VII. CONCLUSION
Analysis of Super-Kamiokande atmospheric neutrino data over a 328 kton-year exposure of the detector indi-cates a weak preference for the normal mass hierarchy, disfavoring the inverted mass hierarchy at 93.0% assuming oscillation parameters at the analysis best fit point and preferring matter over vacuum oscillations by 1.6σ. Assuming the normal mass hierarchy the constraints on the atmospheric mixing parameters are sin 2 θ 23 = 0.588 +0.031 −0.067 and ∆m 2 32 = 2.50 +0.13 −0.20 , with δ CP = 4.19 +1.37 −1.59 . Fitting in conjunction with a model of the T2K experiment generally enhances these constraints and the preference for the normal mass hierarchy. Over the range of parameters allowed at 90% C.L. the inverted mass hierarchy is disfavored by between 80.6% and 96.7% for SK by itself and by between 91.5% and 94.5% when SK is combined with T2K for the θ 13 -constrained fits.    [27] 0.34 4.6 sin 2 (θ13) [27] 0.11 5.4 a Difference from the Nieves [24] model is set to 1.0 b Difference from CKMT [42] parametrization is set to 1.0 c Difference from GRV98 [43] is set to 1.0 d Difference from the Hernandez [44] model is set to 1.0 e Difference from NEUT without model from [24] is set to 1.0 TABLE VIII. Neutrino interaction, particle production, and PMNS oscillation parameter systematic errors that are common to all SK run periods. The second column shows the best fit value of the systematic error parameter, j , in percent and the third column shows the estimated 1σ error size in percent.   TABLE IX. Systematic errors that are independent in SK-I, SK-II, SK-III, and SK-IV. Columns labeled 'fit' show the best fit value of the systematic error parameter, j , in percent and columns labeled σ shows the estimated 1σ error size in percent.