Combined sensitivity to the neutrino mass ordering with JUNO, the IceCube Upgrade, and PINGU

The ordering of the neutrino mass eigenstates is one of the fundamental open questions in neutrino physics. While current-generation neutrino oscillation experiments are able to produce moderate indications on this ordering, upcoming experiments of the next generation aim to provide conclusive evidence. In this paper we study the combined performance of the two future multi-purpose neutrino oscillation experiments JUNO and the IceCube Upgrade, which employ two very distinct and complementary routes towards the neutrino mass ordering. The approach pursued by the $20\,\mathrm{kt}$ medium-baseline reactor neutrino experiment JUNO consists of a careful investigation of the energy spectrum of oscillated $\bar{\nu}_e$ produced by ten nuclear reactor cores. The IceCube Upgrade, on the other hand, which consists of seven additional densely instrumented strings deployed in the center of IceCube DeepCore, will observe large numbers of atmospheric neutrinos that have undergone oscillations affected by Earth matter. In a joint fit with both approaches, tension occurs between their preferred mass-squared differences $ \Delta m_{31}^{2}=m_{3}^{2}-m_{1}^{2} $ within the wrong mass ordering. In the case of JUNO and the IceCube Upgrade, this allows to exclude the wrong ordering at $>5\sigma$ on a timescale of 3--7 years --- even under circumstances that are unfavorable to the experiments' individual sensitivities. For PINGU, a 26-string detector array designed as a potential low-energy extension to IceCube, the inverted ordering could be excluded within 1.5 years (3 years for the normal ordering) in a joint analysis.

The ordering of the neutrino mass eigenstates is one of the fundamental open questions in neutrino physics. While current-generation neutrino oscillation experiments are able to produce moderate indications on this ordering, upcoming experiments of the next generation aim to provide conclusive evidence. In this paper we study the combined performance of the two future multi-purpose neutrino oscillation experiments JUNO and the IceCube Upgrade, which employ two very distinct and complementary routes towards the neutrino mass ordering. The approach pursued by the 20 kt medium-baseline reactor neutrino experiment JUNO consists of a careful investigation of the energy spectrum of oscillatedνe produced by ten nuclear reactor cores. The IceCube Upgrade, on the other hand, which consists of seven additional densely instrumented strings deployed in the center of IceCube DeepCore, will observe large numbers of atmospheric neutrinos that have undergone oscillations affected by Earth matter. In a joint fit with both approaches, tension occurs between their preferred mass-squared differences ∆m 2 31 = m 2 3 − m 2 1 within the wrong mass ordering. In the case of JUNO and the IceCube Upgrade, this allows to exclude the wrong ordering at > 5σ on a timescale of 3-7 years -even under circumstances that are unfavorable to the experiments' individual sensitivities. For PINGU, a 26-string detector array designed as a potential low-energy extension to IceCube, the inverted ordering could be excluded within 1.5 years (3 years for the normal ordering) in a joint analysis.

I. INTRODUCTION
The neutrino mass ordering (NMO) is one of the most important questions to be solved in the field of neutrino oscillation physics. It impacts the limit on the (absolute) incoherent sum of neutrino masses from cosmology [1], has important implications for the searches for neutrinoless double-β decay [2], and leads to a better understanding of the (flavor-mass) mixing in the lepton sector [3]. In addition, knowledge of the NMO is an important prerequisite for unambiguously measuring leptonic chargeparity (CP) violation [4].
By convention, m 1 is assumed to be lighter than m 2 , and ∆m 2 21 is taken to be the smallest mass-squared difference in magnitude [5]. A global analysis of solar neutrino and KamLAND data yields ∆m 2 21 ≡ m 2 2 − m 2 1 ≈ +7.5 × 10 −5 eV 2 (known to a precision of about 2 %) [6], but leaves us with two possible realizations of the neutrino mass ordering. The "normal" ordering (NO) has m 1 < m 2 < m 3 , whereas the "inverted" ordering (IO) has m 3 < m 1 < m 2 . They are distinguished by the sign of the mass-squared difference ∆m 2 31 (or, equivalently, ∆m 2 32 ), with ∆m 2 31 > 0 in the case of the NO and ∆m 2 31 < 0 in the case of the IO.
The global analysis of atmospheric, accelerator, and medium-baseline reactor neutrino oscillation experiments determines the absolute value |∆m 2 31(32) | ≈ 2.5 × 10 −3 eV 2 to better than 2 % precision [6]. The sign of this larger of the two independent mass-squared differences still remains unknown; only recently have global fits begun to demonstrate a growing preference for the NO [7][8][9][10].
The most sensitive current long-baseline accelerator experiments NOνA and T2K, as well as the atmospheric neutrino experiment Super-K, each show some preference for the NO [11][12][13], through Earth matter effects on ∆m 2 31 -driven oscillations [14][15][16][17]. Subtle synergy effects from medium-baseline reactor experiments without NMO sensitivity (Daya Bay, Double Chooz, RENO) significantly contribute to the global constraint, owing to tension in the preferred values of ∆m 2 31 within the IO [7, Fig. 9]. Whether a 5σ measurement of the NMO will be obtained with a combination of current experiments is unclear. This explains the need for additional experimental efforts [18].
An approach that has so far not been realized consists of exploring subleading survival probability terms arising from the difference between ∆m 2 31 and ∆m 2 32 for electron anti-neutrinos detected at a distance of O(50 km) from a nuclear reactor [19]. This technique, pursued by the medium-baseline reactor experiments JUNO [20] and RENO-50 [21], provides a promising route forward on its own. In addition, it has been shown that its combination with an independent measurement of |∆m 2 31 | by a long-baseline experiment further enhances the NMO sensitivity [22]. An even stronger enhancement can be expected from the combination with next-generation long-baseline oscillation data, collected by atmospheric neutrino oscillation experiments such as Hyper-K [23,24], ICAL@INO [25], ORCA [26] and PINGU [27,28], or the accelerator experiments DUNE [29] and T2HK/T2HKK [30,31]. All of these will rely on more or less pronounced effects of Earth matter on the neutrino flavor evolution in order to determine the NMO.
In this paper we study in detail the expected NMO capabilities of the combination of the medium-baseline reactor experiment JUNO and either the IceCube Upgrade [32] (funded) or PINGU (proposed), which are lowenergy extensions of the IceCube detector that are sensitive to atmospheric neutrino oscillations. The individual published projected NMO sensitivities of JUNO [20] and PINGU [27,28] exceed those of the current-generation oscillation experiments. Even more crucially, as has been shown in [33], the joint fit of the oscillation data from JUNO and PINGU will profit from strong synergy effects. These are brought about by tension between the fit values of ∆m 2 31 within the wrong NMO-similar to but stronger in magnitude than the tension encountered by the current global fits referred to above. Since the underlying neutrino oscillation physics is the same, synergy will also occur between JUNO and the imminent IceCube Upgrade. We will demonstrate explicitly that even in this combination, the synergy is strong enough to decisively exclude the wrong ordering.
JUNO [20] is a 20 kt liquid scintillator (linear alkylbenzene) detector currently under design and construction near Jiangmen in South China. Deployed in an underground laboratory with 700 m overburden, it is designed to measure the disappearance of MeV-energyν e produced by the Yangjiang and Taishan nuclear power plants at a distance of approximately 53 km. At the start of data taking, anticipated in 2021, the number of operational reactor cores is expected to be eight instead of the baseline number of ten. Therefore, we conservatively perform the JUNO analysis with both configurations. In the detector, the reactorν e convert to positrons via the inverse beta decay (IBD) process [34,35] on protons. This results in a characteristic pair of prompt and delayed photon showers, which are detected by about 17 000 20-inch photomultiplier tubes (PMTs). The high number of photoelectrons per event (1200 p.e./MeV) [20] allows for a percent-level neutrino energy resolution. As a result, the NMO can be determined through a precise measurement of the oscillatory fine structure imprinted on the neutrino energy spectrum.
The IceCube-Gen2 facility is the planned nextgeneration extension of IceCube. It will integrate the operating IceCube detector together with four new components: (1) the IceCube-Gen2 optical array complemented by (2) the low-energy core, (3) the radio array, and (4) the surface array [36].
An essential first step in the path towards IceCube-Gen2 is the IceCube Upgrade project [32] (funded and under construction, scheduled for deployment in the 2022/2023 polar season), in the following referred to simply as "Upgrade" for brevity. It provides the initial, highly sensitive low-energy-core array and serves simultaneously as a path-finder mission for the larger IceCube-Gen2 optical high-energy array. A more ambitious detector configuration than the Upgrade array is PINGU [27,28], that would consist of 26 densely instrumented strings. Both the Upgrade and PINGU will in-fill the existing IceCube DeepCore [37] region in deep glacial ice, thereby increasing the neutrino detection efficiency for energies of a few GeV. In this paper we consider the sensitivity of both the Upgrade and the PINGU array.
Our simulation of the Upgrade assumes that its 7 strings are instrumented with optical modules identical in type to the high quantum-efficiency (HQE) DOMs deployed in DeepCore [38,39]. Each string holds 125 modules, spaced 2.4 m apart. The spacing between the strings is about 25 m. In the meantime, some aspects related to detector instrumentation and layout have received updates [32]. For example, the string spacing has been adapted for calibration purposes, and the majority of sensors will be optical modules of the next generation.
In the case of PINGU, we assume that all 26 strings are composed of 192 HQE DOMs. Similar to the Upgrade, the spacing between neighboring strings is approximately 25 m, while the inter-DOM spacing is reduced, at 1.5 m. These detector specifications are identical to those in [27,28].
Independent of design details, both the Upgrade and PINGU will predominantly be sensitive to atmospheric neutrinos undergoing deep inelastic neutrino-nucleon scattering [35], at energies above a few GeV. The Cherenkov light that is emitted by charged final-state particles allows reconstructing the energy and zenith angle of the neutrino. Neutrino oscillation signatures from terrestrial matter effects, here especially the influence of Earth's core and lower mantle, have to be resolved in order to determine the NMO. This paper builds on the original study in [33] which examines the NMO potential of a combined analysis of JUNO and PINGU, comparing it to the experiments' stand-alone capabilities. In order to test if the conclusions drawn in [33] hold under more detailed detector descriptions, we perform comprehensive livetime and parameter scans employing the same MC (where applicable) and the same set of systematic uncertainties that were used by the original design reports [20,28] of the two collaborations. We also perform a combined analysis that includes the Upgrade instead of PINGU and the 8instead of the 10-core JUNO configuration.
Our paper is structured as follows: Chapter II introduces the analysis framework and the modeling of JUNO, the Upgrade, and PINGU in our simulation. Chapter III addresses the statistical approach to the NMO problem and discusses the benefits of performing a combined NMO analysis with JUNO and PINGU. It also demonstrates that similar conclusions hold in the case of a joint analysis of JUNO with 8 reactor cores and the Upgrade. We illustrate the origin of the synergy, contrast the projected combined sensitivities with the stand-alone performances of the respective experiments, and show how truth assumptions impact the NMO discovery potential. Chapter IV briefly examines the sensitivity to oscillation parameters of a combined  analysis in case the NMO is correctly identified; the projected constraints are compared to the precision obtained by a recent global fit. Chapter V summarizes our results and concludes with a short outlook.

II. GENERATION OF EXPECTED SPECTRA
We use the PISA [40] software to compute the expected experimental outputs of JUNO, the Upgrade, and PINGU. As depicted in Fig. 1, each experiment is modeled by means of a series of subsequent stages: 1. Flux: Outputs the energy-and direction-dependent neutrino flux at the location of the detector in the absence of oscillations.

Oscillation:
Calculates neutrino oscillation probabilities depending on energy and baseline given an input flux and weights the latter accordingly to obtain the oscillated neutrino flux at the detector.
3. Exposure: Uses effective area/mass and livetime/operation time of the detector to convert the oscillated flux into an event number.

Reconstruction and event classification:
Smears the "true" event observables (energy and/or direction) with resolution kernels informed from detailed MC simulation (Upgrade and PINGU) or parametrized estimates (JUNO) and classifies Upgrade and PINGU events according to efficiencies obtained from separate, dedicated MC simulation.
We generate each experiment's event distribution as a histogram in 1d (JUNO) or 3d (Upgrade and PINGU). The common dimension is neutrino energy (E ν ). Upgrade and PINGU events are in addition binned in cosinezenith (cos(θ Z )) and event type ("cascade-like" or "tracklike").

A. JUNO
Following the approach in [20], we use 200 bins equally spaced in energy covering the range 1.8 MeV < E ν < 8.0 MeV for each reactor. For the flux and the oscillation stage we treat each reactor individually due to the different baselines ranging from 52.12 km to 52.84 km. Since JUNO does not distinguish neutrinos produced by different reactors, the binned energy spectra are superimposed after the exposure stage.
a. Flux The differentialν e flux per unit energy from one nuclear power plant's (NPP) reactor core at the detector location is approximated by the relation [20] where W th,core is the thermal power of the core, L core its distance from the detector, and f i , e i , and S i (E ν ) are the fission fraction, the thermal energy released per fission, and theν e energy distribution per fission for the ith isotope, respectively.
The values of f i and e i for the four dominant isotopes in an NPP reactor core are taken from [41] and [42]. For the energy spectra S i (E ν ) we use the approximation given in [43]. In addition, each core's thermal power W th,core and distance from the JUNO detector L core are obtained from [20, Table 2]. This allows us to predict the unoscillated neutrino fluxes from all reactor cores at the location of the JUNO detector.
The systematic flux uncertainties we consider are an uncorrelated relative uncertainty of 0.8 % on the thermal power of each core-proportional to the output neutrino flux-and a correlated uncertainty of 2 % on all cores' overall neutrino flux normalization. These systematic error sources are implemented as Gaussian priors.
We do not include the so-called "bump" observed in various reactor anti-neutrino spectra [44]: we expect the reference detector at the Taishan site, the recently proposed JUNO-TAO (Taishan Antineutrino Observatory) [45,46], to provide a precise measurement of the unoscillated JUNO energy spectrum.
b. Oscillation The calculation of theν e survival probability P ee (E ν ; L core ) for a givenν e energy E ν and oscillation baseline L core is performed using the Prob3++ code [47]. We include Earth matter effects, whose impact on the survival probability is at the 1 % level for JUNO baselines [48].
The oscillatedν e flux impinging on the JUNO detector from one reactor core at the distance L core follows from the unoscillated flux Φ 0 as (2) c. Exposure The rate of detectedν e events per unit energy is obtained as the sum over the contributions from all ten reactor cores, where we have dropped the explicit dependence on the set of baselines {L core } on the left. A eff (E ν ) is the detector's effective area, which corresponds to the product of the number of target protons N p and the IBD cross section σ IBD (E ν ), corrected by the selection efficiency , We evaluate Eq. (4) assuming N p = 1.54 × 10 33 (chosen by matching the IBD event rate to that reported in [20], see below) and substituting the IBD cross section σ IBD (E ν ) from [34]. Eq. (3) gives the differential event spectrum before reconstruction. Without selection cuts ( = 1), integrating Eq. (3) over the considered energy range yieldsṄ =1 IBD ∼ 83 d −1 [20] for the 10-core JUNO configuration. Including selection cuts, the IBD selection efficiency is given as 73(1) % [20, Table 3]. This results in aν e rate ofṄ IBD ∼ 60 d −1 . In order to account for reactor and detector downtime, we assume an effective livetime of 300 days per year of data taking [22].
d. Reconstruction JUNO determines the neutrino energy via the visible energy E vis of the prompt IBD positron, which corresponds to the neutrino energy reduced by 0.784 MeV: E vis = E ν − 0.784 MeV. Since JUNO aims to achieve an IBD positron visible-energy resolution of 3 % at E vis = 1 MeV, each spectral bin is "smeared" with the corresponding energy uncertainty by convolving the true event distribution in E ν with a normal distribution of standard deviation ∆E(E ν ). e. Background sources Several background sources contribute to JUNO's event distribution. Besides the two long-baseline nuclear power plants Daya Bay and Huizhou, accidental backgrounds, fast neutrons, 9 Li/ 8 He decays, 13 C(α, n) 16 O reactions, and geoneutrinos are relevant. All of these can mimic a reactorν e IBD event. Daya Bay and Huizhou [20, Table 2] are added to the simulation in the same manner as the signal-producing reactor cores, leading to 5 additional selected reactorν e events per day in JUNO. The rates of the other backgrounds are extracted from [20,Fig. 19] and are added after reconstruction. Note that the 9 Li/ 8 He rate is reduced by 97.7 % [20] due to the application of a cosmogenic veto. The corresponding loss of IBD events is included in the selection efficiency quoted above.
f. Systematics summary Table I summarizes all the systematic parameters applied in the simulation of the JUNO event distribution. Here, the uncorrelated shape uncertainty of the output event histogram is realized through a modification of the χ 2 definition which we minimize numerically in the NMO analysis (see Eq. (11)). systematic error source uncertainty fit uncorrelated reactor flux normalization 0.8 % correlated reactor flux normalization 2 % IBD selection efficiency 1 % uncorrelated shape uncertainty 1 % × Table I. Systematic error sources employed in the modeling of the JUNO experiment, together with their assumed 1σ Gaussian uncertainties. The third column specifies whether each systematic error is implemented as a free fit parameter (checkmark) or not (cross).
In a stand-alone analysis which assumes the same oscillation parameter inputs as [20], we obtain an NMO significance of 3.2σ for true NO after 6 years of livetime, compared to 3.3σ in [20, Table 4 ("standard sens.")]. Due to more recent global inputs for the oscillation parameters [7,8] (reproduced in Table III for convenience), the analysis presented in this work finds a nominal standalone JUNO sensitivity that is reduced 1 to 2.8σ after 6 years. Assuming the NO to be true, Fig. 2 compares the expectedν e energy spectrum (without any backgrounds) after 6 years resulting from these recent global inputs to the one based on the oscillation parameter assumptions made in [20].
g. Reduced reactorν e flux Since it is possible that two of the ten nominal JUNO signal reactor cores (Taishan 3 & 4) will not be in operation at the start of data taking, we also study a conservative scenario in which 1 The main contributions to this reduction are brought about by shifts in ∆m 2 31 and ∆m 2 21 . Eν,reco of reactorνe events in JUNO given true NO for our analysis binning after 6 years of operation (1800 days), without any background contribution. Shown is the expected spectrum assuming the nominal oscillation parameter inputs from [20] (thin orange line) and the spectrum assuming the inputs from In the case of the Upgrade we choose 10 bins linear in the reconstructed neutrino cosine zenith cos(θ Z,reco ), covering the range −1 ≤ cos(θ Z,reco ) ≤ 0 of Earthcrossing (from vertically up-going to horizontal) trajectories. Reconstructed neutrino energies E ν,reco are binned using 10 logarithmically spaced bins spanning 3 GeV ≤ E ν,reco ≤ 80 GeV. Due to the better resolutions of PINGU compared to the Upgrade, the number of bins in energy is raised to 39, with the lower bound reduced to E ν,reco = 1 GeV, in analogy to [28]. Similarly, the number of bins in cosine zenith is raised to 20.
a. Flux Atmospheric neutrino fluxes at the detector site at the South Pole are obtained from tabulated MC simulations performed by Honda et al. [49]. For each neutrino type, 2 these tables provide average unoscillated fluxes across bins in neutrino energy, cosine zenith angle, and azimuth. Starting from two-dimensionalazimuth-and season-averaged-tables, we evaluate the flux of each atmospheric neutrino species on a (200×390) grid in (cos(θ Z ), E ν ) via an integral-preserving interpolation [40]. Systematic flux uncertainties which we consider include the neutrino energy scale, to which we assign an uncertainty of 10 %, the ratio between electron and muon neutrino fluxes, assuming an uncertainty of 3 %, the ratio between neutrino and anti-neutrino fluxes, with an uncertainty of 10 %, as well as a shift of the spectral index with respect to the nominal flux model, assuming an uncertainty of 5 %.
b. Oscillation The NMO signature in IceCube depends on the Earth matter density distribution through its impact on the neutrino oscillation probabilities [14][15][16][17]. In order to calculate the latter we again use the Prob3++ code together with the Preliminary Reference Earth Model (PREM) [50], with a division of the Earth into 12 constant-density layers. The electron fractions in the Earth's core and mantle are assumed to be Y c e = 0.4656 and Y m e = 0.4957, respectively. The oscillation baseline for a given trajectory is determined by assuming a neutrino production height in the atmosphere of 20 km [40] and a detector depth of 2 km. Together with the Earth's radius of about 6371 km and the neutrino zenith angle, these fully determine the neutrino trajectory and thereby the traversed density profile, which in turn is required to calculate oscillation probabilities for a given neutrino energy. Since the intrinsic atmospheric flux of tau neutrinos is negligible at the energies of interest to the NMO [51], the relevant transition channels that need to be considered are ν e → ν e,µ,τ and ν µ → ν e,µ,τ . The oscillated flux (in a bin centered on log 10 E ν /GeV and cos(θ Z )) of atmospheric neutrinos of flavor β impinging on the detector is then obtained from the sum over the unoscillated fluxes Φ 0,α (E ν , cos(θ Z )) of all initial flavors α, weighted by the probabilities P αβ (E ν , cos(θ Z )) of the flavor transitions α → β (note that we drop the explicit dependence on energy and cosine zenith for brevity on both sides of Eqns. (6), (7)): c. Exposure Given an incoming oscillated flux of neutrinos of flavor β, Φ osc,β , the resulting event rate for a given interaction channel (charged current (CC) or neutral current (NC)), is calculated for each bin in true neutrino energy E ν and cosine zenith cos(θ Z ). Here, A eff,β (E ν , cos(θ Z )) is the effective area of the detector for the given neutrino species and interaction type. The scaling factor s A represents a universal systematic uncertainty on all effective areas. The effective area functions A eff,β for both the Upgrade and PINGU are obtained by binning MC events from detailed detector simulations [28] weighted by their individual effective-area weights on a (20 × 39) grid in (cos(θ Z ), E ν ). In order to mitigate fluctuations due to limited MC statistics, the histograms are smoothed in both dimensions using Gaussian smoothing followed by spline smoothing [40]. We extract separate effective areas for neutrinos and anti-neutrinos. We similarly separate the individual flavors in the case of CC interactions. Events due to NC interactions, however, are combined into a single effective-area function. They are also treated identically in reconstruction and event classification (see below). Hence, we are left with the event categories ν e CC, ν µ CC, ν τ CC, ν e,µ,τ NC (and the same for antineutrinos). The effective area scale s A serves as a normalization of the combined neutrino and anti-neutrino event rate. It is assigned an uncertainty of 10 %. Based on an uptime of the IceCube detector that routinely exceeds 99 % [52], the assumed effective livetime for the Upgrade and PINGU is 365.25 days per year of data taking.
d. Reconstruction and event classification Having undergone a detailed likelihood reconstruction at the single photon level, the same sets of MC events used to model the Upgrade's and PINGU's effective areas are employed in applying reconstruction resolutions. Here, variable bandwidth kernel density estimation (VBWKDE) mitigates the effects of statistical fluctuations and yields the probability distributions ("smearing kernels") that map from true neutrino energy and cosine zenith onto the corresponding reconstructed observables [40]. The third output dimension is the event class, "cascade-like" or "track-like". It is determined from a pre-calculated score with discrimination power between ν µ +ν µ CC events, which contain a minimum-ionizing muon track, and the rest, which result in a rather spherical light deposition pattern [28]. Neutrinos and anti-neutrinos are grouped together before the resolution and classification distributions are generated.
e. Background sources Since the Earth acts as a shield against up-going cosmic ray muons, and strict cuts are applied to reject the muon flux entering through the surrounding IceCube detector from above, we do not include any background contamination in the event distributions of either the Upgrade or PINGU [28].
f. Systematics summary Table II provides an overview of the systematic uncertainties we consider for the Upgrade and PINGU. Both the energy and effective area scale have been introduced to capture the effects of multiple systematic uncertainties. This is a consequence of the significant amount of MC simulation that would otherwise be required for a detailed study of the various detector-related systematics, such as the overall detection efficiency of the optical modules [28]. Furthermore, data taken with the Upgrade or PINGU will profit from an improved knowledge of the optical properties of the ice, owing to the deployment of novel in-situ calibration devices and a modified hole-drilling method [32]. As a result, we do not expect a degradation of the NMO sensitivity due to ice uncertainties. For a discussion of their impact on the PINGU NMO sensitivity and the NMO measurement with IceCube DeepCore, see also [28] and [53], respectively.
With the oscillation parameter values assumed in [28], systematic error source uncertainty fit atmospheric spectral index shift 5 % νe/νµ flux ratio scale 3 % ν/ν flux ratio scale 10 % energy scale 10 % effective area scale 10 % Table II. Systematic error sources employed in the modeling of the Upgrade and PINGU. The entries are to be interpreted in the same way as those of Table I.  our stand-alone analysis of PINGU yields a median significance 3 of 3.2σ to exclude the IO for true NO after 4 years of data taking, somewhat larger than the value of 2.8σ presented in [28], mostly due to the improved VB-WKDE treatment of the resolution functions [40]. For the Upgrade, in contrast, the are no previously published sensitivities. Figure 3 shows the expected event distribution for 4 years of data taking given the more recent global NO in- 3 For the sake of comparison, in this paragraph only we employ the definition of the median NMO sensitivity given in [28, Eq. (A.13)], which differs slightly from the "standard" sensitivity proxy adopted throughout the remainder of this paper. Note that the sensitivities in this paragraph should not be compared to those shown later in  [7,8]. The second-to-last column ("fit") denotes whether the given parameter is fit (checkmark) or kept fixed (cross).
In case it is fit, the last column shows the range explored by the minimizer with respect to the nominal parameter value.
puts from Table III for the Upgrade as well as for PINGU.

III. EXPECTED NMO SENSITIVITIES IN A COMBINED ANALYSIS
A. Statistical approach a. Sensitivity proxy In the following, all NMO sensitivities are obtained by producing toy data under the ordering assumed to be true (TO)-generated from the oscillation parameter values in Table III (unless stated  otherwise) and with all systematic uncertainties at their nominal values-and fitting the toy data by numerically minimizing a χ 2 function over all free parameters while restricting the minimizer to the wrong ∆m 2 31 half-plane (wrong ordering, WO). Since no statistical fluctuations are applied to the toy data, we refer to it as the "Asimov dataset" [54]. We employ the test statistic where χ 2 NO(IO) ≡ min {pi}∈NO(IO) χ 2 corresponds to the minimum when the fit hypothesis (with free parameters {p i }) is NO (IO). In the Asimov approach, toy data generated under the TO results in χ 2 TO = 0 by construction. Hence, one has We then convert Eq. (9) into a median NMO significance by taking ∆χ 2 . This sensitivity proxy corresponds to a one-sided number of standard deviations at which the WO is excluded. Note that the relation is exact in the case of symmetric Gaussian distributions of the ∆χ 2 test statistic [55], which we have found to hold to a good approximation. 4 b. Nomenclature In the following, combined fit refers to the minimization of the expression where In the three χ 2 contributions (11)-(13), the indices i in Eq. (11) and j in Eq. (12) run over the event histogram bins employed for JUNO and IceCube (either Upgrade or PINGU), respectively, whereas k in Eq. (13) runs over all nuisance parameters subject to external Gaussian constraints. n obs is the measured number of events in the toy data, n exp the expected number, and σ uncorr = 0.01 the uncorrelated reactor flux shape uncertainty. For the kth nuisance parameter subject to an external constraint, a deviation ∆p k from its nominal value is penalized according to the parameter's standard deviation σ k as (∆p k ) 2 /σ 2 k . The set of nuisance parameters considered depends on the experimental configuration. In the case of JUNO, these are the oscillation parameters ∆m 2 31 , θ 12 , θ 13 , and the systematic uncertainties enumerated in Table I. In the case of IceCube, they are ∆m 2 31 , θ 13 , θ 23 , and the systematic uncertainties in Table II. A combined fit includes the union of these two sets of parameters. Oscillation parameters that are kept fixed in the stand-alone or combined fits have little to no impact on the NMO sensitivities we find.
In some cases, the significance derived from evaluating Eq. (11) or (12) at the minimum of the combined fit is given in addition; we refer to it as a given experiment's statistical contribution to the combined significance.
Finally, whenever we show fit parameter scans within the TO, the definition of ∆χ 2 differs from that given by Eq. (9) in that ∆χ 2 corresponds to the local χ 2 TO value (at the considered point within the TO hypothesis parameter space), as opposed to the local χ 2 WO value in the case of fit parameter scans within the WO.

B. Synergy effects
Why is it to be expected that the NMO sensitivity of the combined analysis exceeds the simple sum of the two individual experiments' sensitivities? Here, we discuss PINGU in the combined analysis with JUNO with the aim of clarity in illustrating the synergy effects and to be able to compare our results to [33].
The upper part of Fig. 4 shows the JUNO and PINGU ∆χ 2 profiles as function of the tested value of ∆m 2 31 within the true ordering (NO on the left, IO on the right) for an exposure time of 6 years, minimized over all other parameters at each point. As the scan is performed on the Asimov dataset, the minimal ∆χ 2 value is found at the true value of ∆m 2 31 and vanishes exactly for both JUNO and PINGU; the widths of the minima give the precision to which the two experiments are able to constrain ∆m 2 31 when either the NO or the IO are correctly identified.
The result changes profoundly when the same scan is performed while assuming the wrong ordering in the fit, depicted in the lower panels of Fig. 4. The inherent tension between data and hypothesis, i.e., the wrong sign of ∆m 2 31 , in both JUNO and PINGU results in ∆χ 2 > 0 for all tested values of ∆m 2 31 . In this case, each experiment's minimum ∆χ 2 serves as a sensitivity proxy-the larger it is, the better each experiment distinguishes the wrong ordering from the true one.
Most importantly, the ∆m 2 31 values at the minima (the best-fit values) no longer coincide. For both experiments, in terms of the absolute value |∆m 2 31 | the best fit is smaller than the truth when NO is true and greater when IO is true. While this deviation is always larger for PINGU (and increased by Earth matter effects compared to the hypothetical case of vacuum oscillations), the JUNO fit provides a more pronounced ∆χ 2 minimum, which in addition is much narrower than the shift between the two experiments' best fits. It is precisely this configuration which explains the benefits of the combined analysis: by forcing the simultaneous fit of JUNO's and PINGU's event distributions to assume the same value for ∆m 2 31 within the wrong NMO, the ∆χ 2 minimum grows much beyond the simple sum of the two individual fits' minima. Note that in the absence of tension in the non-displayed nuisance parameters (which is approximately the case here), the best fit in ∆m 2 31 would simply be given by the position of the minimum of the sum of the two experiments' ∆χ 2 profiles (labeled "simple sum" in Fig. 4); the combined NMO sensitivity would be given by the value of ∆χ 2 at the minimum.
The differing ∆m 2 31 dependencies of the NMO measurements by JUNO and PINGU constitute the most pronounced synergy effect of their combined analysis. Similar to the lower part of Fig. 4, Fig. 5 shows the ∆χ 2 profiles as a function of the tested value of sin 2 (θ 13 ) in the wrong ordering, that is, when only allowing the minimizer to explore the region ∆m 2 31 < 0 (> 0) for true NO (IO) at each value of sin 2 (θ 13 ). Here, we only subject θ 13 to the prior of Table III in the lower panels; no prior is used in the upper panels.
In absence of a prior, the sum of the minimal ∆χ 2 of the stand-alone profiles is significantly lower than the minimum of the summed ∆χ 2 curve. The latter in this case represents a combined analysis that only forces the  Figure 4. ∆χ 2 profiles as function of the tested/fit values of ∆m 2 31 within the true ordering (upper panels) and the wrong ordering (lower panels) for a livetime of 6 years for both experiments. On the left (right), the NO (IO) is assumed to be true. The scans within the wrong ordering illustrate the synergy effect of performing a combined fit. Here, a tension in the best-fit values of ∆m 2 31 for PINGU and JUNO is visible that is greater than the "resolution" of the two experiments. Shown in addition are the hypothetical wrong-ordering profiles assuming a vanishing matter density along all neutrino trajectories in the case of PINGU (labeled "PINGU vacuum"). The line labeled "simple sum" (dashed orange) is the sum of the "JUNO" and "PINGU" curves at each tested value of ∆m 2 31 .
two experiments to assume the same θ 13 value within the wrong NMO. Both the NMO sensitivity derived from summing the minima and from the joint fit of θ 13 only are significantly smaller than that obtained in a full joint analysis ("combined"), in which also the values of all other oscillation parameters are required to match. This in turn clearly indicates that the tension in θ 13 is small compared to that in ∆m 2 31 . Even the comparably small beneficial impact due to θ 13 mostly disappears once the latter is assigned the prior from Table III, as shown in the lower panel of Fig. 5. Here, it becomes apparent that summing the stand-alone minima yields approximately the same NMO sensitivity as the joint fit of θ 13 only.
We consider the prior important for studying realistic scenarios and apply it in all other sensitivity calculations for two reasons. PINGU's best-fit values for θ 13 within the wrong ordering are far outside the globally allowed regions [7,8] (and outside the plotting ranges of the top panels of Fig. 5). Also, its ∆χ 2 minima in absence of the prior are visibly reduced.
The fact that IceCube is not sensitive to θ 12 and JUNO is not sensitive to θ 23 means that these parameters do not contribute any additional synergies. Moreover, we expect no other synergy effects to benefit the combined analysis since the two experiments are not assumed to share any systematic uncertainties beside the oscillation parameters discussed above.
C. Sensitivity after 6 years of exposure Table IV gives the NMO sensitivity of the combined analysis for 6 years of data taking with each of the experiments, for both true NO and true IO using the nominal oscillation input models of Table III. As before, we assume the experiments' simultaneous start. The table also shows the experiments' stand-alone sensitivities as well as the sensitivities which each would obtain using the best-fit parameter values of the combined analysis, but not including prior penalties (labeled as "statistical contribution"). On the one hand, this illustrates how the combined analysis profits simply from the tighter priors provided by the other experiment. On the other hand, the statistical contribution reveals the contribution of the synergetic effect to the sensitivity gain, as described in  Figure 5. θ13 synergy effect after 6 years of livetime of both experiments, without (top) and with (bottom) a prior on the parameter, assuming true NO on the left and true IO on the right. In each case, the wrong ordering is fit to the true one. In contrast to Fig. 4, here we show the full combined fit (solid red) in addition to the "simple sum" (dashed orange).
Sec. III B. In addition to the combination of JUNO and PINGU, which is subject to the strong synergy illustrated in the previous discussion, we here also evaluate the corresponding sensitivities based on the smaller-scope experimental setups: JUNO with a reduced number of eight reactor cores and the IceCube Upgrade. Considering the stand-alone 8-core JUNO configuration, the sensitivity is projected to be around 2.5σ for either NMO, while that of the Upgrade is strongly dependent on which of the two input models is assumed to be true: for true NO, we obtain a significance of 3.8σ, whereas for true IO we find 1.8σ. The combined result exceeds the 5σ threshold considered as decisive.
A similar picture emerges when one considers the nominal JUNO setup and PINGU, albeit at a higher significance-level: while we expect JUNO's stand-alone sensitivity to be close to 3σ for either NMO, the PINGU sensitivity in the case of true NO reaches more than 6σ, and 3.5σ for true IO. The combined sensitivity is such that the NMO is established at exceedingly high levels of confidence of 9.6σ for true NO and 7.3σ for true IO.
Comparing the combined significances to the statistical contributions makes evident that the Upgrade and PINGU benefit far more from the oscillation parameter constraints provided by JUNO (in its reduced or nominal source configuration) than vice versa. As pointed out before, the main reason is found in the lower panels of Fig. 4, where the JUNO ∆m 2 31 constraint is stronger than that provided by PINGU. As a result, the combined minimal ∆χ 2 lies close to the position of the minimum preferred by JUNO, creating a stronger tension with the data obtained by PINGU. The same holds true if one considers the 8-core JUNO configuration and the Upgrade instead.
The above effect is also illustrated in Table V, which lists the fit outcomes within the wrong ordering for the  Table V. Best-fit values within the wrong NMO for the free oscillation parameters after 6 years of operation, when fitting each experiment individually as well as for the combined fit ("comb."). For a given true NMO, the column "inj." specifies the injected parameter values, which are also given in Table III. The dagger denotes fit outcomes that correspond to a bound of a parameter's fit range. two stand-alone analyses of JUNO and PINGU and their combined analysis. As expected, JUNO dominates the best-fit values of |∆m 2 31 | and θ 12 , whereas PINGU dominates the outcome in θ 23 .
The combined best fit for θ 13 (subject to the prior) prefers a value appreciably outside the range delimited by the outcomes of the two individual fits. A similar behavior could lead to a shift of the θ 13 best-fit value with respect to the truth when assuming the wrong ordering in a global fit.

D. Livetime evolution of sensitivity
Based on the assumption of a simultaneous start of data taking, Fig. 6 demonstrates the temporal evolution of the NMO sensitivity for a span between 1 year and 6 years of detector livetime, at the end of which the sensitivities reported in Sec. III C are evaluated. The individual experiments' sensitivities are shown together with their simple (quadratic) sum and the combined sensitivities; true NO is depicted in the two leftmost panels, true IO in the two rightmost ones. Again, we investigate the joint analysis of the 8-core JUNO setup and the Upgrade, as well as the JUNO baseline together with PINGU.
One can see that the combination of the 8-core JUNO configuration and the Upgrade reaches 5σ in less than 4 years (true NO) respectively 6 years (true IO). The combined analysis of JUNO and PINGU reaches 5σ within about 1.5 years for true NO and within about 3 years of livetime of both detectors for true IO. Crucially, the comparison between the combined sensitivity and that resulting from the sum of each pair of experiments' ∆χ 2 minima reveals that the synergy effect increases over timethe reason being that the minima in ∆m 2 31 within the wrong ordering become sharper. As a result, the minimal ∆χ 2 grows faster when a full combined analysis is performed. In conclusion, the combined analysis does not only profit from the statistics gain over time but also from an enhancement of the synergy effect itself.

E. Sensitivity dependence on true parameter values
In the following, we discuss the expected impact of the most important truth assumptions in the measurement of  the NMO with JUNO and IceCube, both for the standalone experiments and in the context of the combined analysis. We focus on the dominant parameter for each experiment, namely the energy resolution in the case of JUNO and the atmospheric mixing angle θ 23 in IceCube. Similar to Sec. III B, we show our one-dimensional results for the baseline JUNO configuration and PINGU for ease of comparison with existing literature; the same qualitative behavior is observed for the 8-core JUNO setup and the Upgrade (Sec. III F provides the full two-dimensional study for that combination).

JUNO energy resolution
The detector's energy resolution (cf. Eq. (5)) is the most critical parameter in the NMO measurement with JUNO. It is crucial for distinguishing between the rapid small-amplitude variations of theν e energy spectrum brought about by the interference of the ∆m 2 31 -and ∆m 2 32 -driven oscillation modes in each of the possible NMO realizations.
The statement above is emphasized in Fig. 7, which shows the dependence of the 6-year sensitivity on the energy resolution for JUNO alone as well as for the different types of combined analyses considered throughout this work. As expected, JUNO's sensitivity decreases as the energy resolution worsens. The projected median significance ranges from below 2σ to just above 5σ for the considered range in energy resolution (2 % to 4 % at a visible energy of 1 MeV). The fact that the PINGU sensitivity within the combined analysis decreases with worse energy resolution in JUNO arises due to the weakening JUNO constraint on ∆m 2 31 within the wrong NMO. This effect is less prominent in the full combined analysis, however: JUNO profits more strongly from the combined analysis when its energy resolution worsens, as indicated by the separation between the solid black and dashed red lines. As the JUNO constraint on ∆m 2 31 gains precision with improving energy resolution, the best-fit value of ∆m 2 31 for the combined analysis moves toward that preferred by JUNO, resulting in JUNO benefiting less from the combination.
The main takeaway, however, is that the combined sensitivity is rather stable with respect to the JUNO energy resolution: when a joint fit is performed, the significance lies well above the 5σ threshold even for the worst energy resolutions tested here.
Note that residual uncertainties in the calibration of the energy scale of the JUNO detector can lead to a reduced NMO sensitivity. As demonstrated in [22], this potential ambiguity can be effectively reduced by a selfcalibration based on the oscillation pattern observed in the reactorν e spectrum. However, even without selfcalibration and for a scenario in which the wrong NMO becomes preferred over the true NMO, we have verified that the combined analysis of JUNO and PINGU delivers a high-significance rejection of the wrong NMO.

Mixing angle θ23
Driving the overall strength of the observed oscillation signal, θ 23 is the dominant parameter regarding Ice-Cube's NMO sensitivity. Figure 8 shows the dependence of the 6-year sensitivity on sin 2 (θ 23 ) for PINGU alone, for the combined analysis of both experiments, and for their respective contributions to the combined sensitivity. JUNO's stand-alone NMO sensitivity is not shown since its event spectrum is unaffected by θ 23 .
For the considered range in sin 2 (θ 23 ), PINGU's projected median significance ranges from around 3σ to 7σ when NO is true, and from around 3σ to 5σ when IO is true. The behavior of PINGU within the combined analysis is nearly the same as in the stand-alone case, with the NMO sensitivity being shifted to a higher value due to the synergy effect. This is not obvious because PINGU is not free to choose the values for ∆m 2 31 and θ 13 anymore, which are dominated (∆m 2 31 ) respectively affected (θ 13 ) by JUNO. Also within the combined analysis, JUNO's contribution to the NMO sensitivity is only barely affected by the true value of sin 2 (θ 23 ). This result is not necessarily obvious either because PINGU's dependence on sin 2 (θ 23 ) could lead to shifts in the bestfit values for other oscillation parameters and therefore indirectly affect the JUNO result as well.
As a result of the considerations above, the behavior of the projected combined sensitivity is similar to PINGU's, though somewhat more stable with respect to the true value of sin 2 (θ 23 ). Again, when a joint fit is performed, the significance lies well above the 5σ threshold even for the least favorable values of sin 2 (θ 23 ) tested here.
F. NMO potential for JUNO with eight cores and the Upgrade Assuming 5σ as the target sensitivity, neither the Upgrade, nor the 8-core JUNO configuration, nor the simple sum of their stand-alone sensitivities is expected to lead to a decisive, > 5σ, determination of the NMO (∼ 5 years of joint operation). However, the boost in sensitivity due to a combined fit is so substantial that this target is attainable, cf. Fig. 6 for our nominal truth assumptions about the JUNO energy resolution and sin 2 (θ 23 ).
Going beyond the nominal scenario for this pair of parameters, Fig. 9 explores the corresponding twodimensional parameter space and shows the time needed to obtain a significance of 5σ in a combined analysis of the 8-core JUNO configuration and the Upgrade at each point. Here, the nominal values are marked by the two orthogonal lines, and the nominal point by the empty square. The dependencies roughly follow the behavior of the combined sensitivity of JUNO and PINGU shown as a function of the true value of each of the parameters separately in Fig. 7 and Fig. 8. For true NO on the left, the least favorable point is located in the upper left hand corner, where the true value of sin 2 (θ 23 ) is smallest (∼ 0.43) and JUNO's energy resolution is worst (∼ 4 %). Here, the NMO can only be determined after about 7 years of data taking. Conversely, when sin 2 (θ 23 ) ≈ 0.63 and the JUNO energy resolution amounts to 2 %, the required time is reduced to around 2.5 years. For true IO on the right, the least favorable value of sin 2 (θ 23 ) approximately coincides with our nominal assumption; depending on the JUNO energy scale, the time to determine the NMO ranges from below 4 years to more than 7 years. The most favorable scenario is again located in the lower right hand corner, where less than 3 years of measurement are expected to suffice in order to determine the NMO.

IV. COMBINED OSCILLATION PARAMETER SENSITIVITIES
The synergy effect on the NMO sensitivity discussed throughout the previous chapter could imply a similar gain in the measurement of the oscillation parameters to which both JUNO and IceCube are sensitive, i.e., θ 13 and ∆m 2 31 . In this chapter we again focus on a combined analysis of JUNO (with ten cores) and PINGU (instead of the Upgrade) to cover the most powerful scenario.
Assuming the NMO has been correctly identified, the two panels in Fig. 10   well as their combined sensitivity to sin 2 (θ 13 ) in the absence of external constraints on this mixing angle, for true NO on the left and true IO on the right. In both cases, we superimpose the current global ∆χ 2 constraint [7,8] in the same plot. As expected, the combined analysis is able to measure sin 2 (θ 13 ) with a slightly higher precision than that obtained via the simple sum of the two stand-alone profiles. It does not yield an improvement over current global constraints though. Fig. 11 shows the analogous information for ∆m 2 31 . In this case, the stand-alone JUNO measurement outperforms the global constraints, and so does PINGU's measurement-albeit to a lesser extent. The combination of JUNO and PINGU, however, does not lead to any further gain in precision compared to that obtained by JUNO alone, no matter whether one takes the simple sum or performs a combined fit.
In summary, as opposed to the case of the NMO measurement, the combined analysis of JUNO and PINGU does not lead to a significant enhancement of the constraints on the oscillation parameters θ 13 and ∆m 2 31 . In the former case the existing global constraints are stronger, whereas in the latter case the projected JUNO stand-alone sensitivity is the same as that of the combined fit. We have verified that these conclusions apply identically also to the combined analysis of JUNO with eight cores and the Upgrade.

V. CONCLUSION
In this paper we investigate the potential of a combined neutrino mass ordering analysis of the data from the fu-ture reactor neutrino experiment JUNO and the future atmospheric neutrino experiments IceCube Upgrade and PINGU.
Owing to the different positions of the minima in the oscillation parameter space within the wrong-ordering hypothesis, the combined analysis of JUNO and the Ice-Cube Upgrade or PINGU achieves an NMO sensitivity which exceeds the purely statistical combination of their stand-alone sensitivities. This synergy effect is based on the distinct experimental setups-for which the dominant neutrino oscillation channels differ, as well as neutrino energies, baselines, and the relevance of Earth matter effects-and is most prominent in ∆m 2 31 . Both JUNO and PINGU will have sensitivity to this mass-squared difference much beyond that of the existing generation of oscillation experiments or their combination. In this regard, the IceCube Upgrade and PINGU benefit far more from the combined analysis than JUNO. The reason is that JUNO's precise ∆m 2 31 constraint within the wrong ordering is the actual driver of the synergy effect.
It should be stressed that the greatest sensitivity benefit is achieved via the combination of JUNO with an NMO-sensitive long-baseline experiment. The synergy here arises from the complementarity between the two measurement methods, i.e., sub-dominant vacuum oscillations versus long-baseline oscillations enhanced by matter effects. Both of these methods result in distinct oscillation patterns when comparing the two possible neutrino mass ordering realizations. Substantially smaller benefit is expected when including more long-baseline experiments.
Our studies demonstrate that the combined analysis of JUNO with eight reactor cores and the IceCube Upgrade   Figure 11. ∆χ 2 profiles as a function of the tested values of ∆m 2 31 within the true ordering for JUNO and PINGU stand-alone, their simple sum, and their combination, for a livetime of 6 years of both experiments. On the left (right), the NO (IO) is assumed to be true. The current global sensitivity to ∆m 2 31 is superimposed (dotted black) [7,8].
is projected to result in a significance of 5σ within approximately 3 years to 7 years of livetime. In the most promising case, corresponding to the combined analysis of JUNO with its full reactor configuration and PINGU, an NMO significance of 5σ can be reached with less than 2 years of data taking. Thus, in brief, a combined analysis with JUNO and IceCube will determine the neutrino mass ordering at a significance beyond the 5σ level within the expected operation times of both experiments, even for a more conservative scenario and for unfavorable regions of parameter space.
Finally, we note that a combined measurement of the oscillation parameters of the PMNS paradigm [56], such as sin 2 (θ 13 ) or ∆m 2 31 , does not significantly improve the stand-alone capabilities or the measurements with the existing generation of experiments.