Statistical significances and projections for proton decay experiments

We study the statistical significances for exclusion and discovery of proton decay at current and future neutrino detectors. Various counterintuitive flaws associated with frequentist and modified frequentist statistical measures of significance for multi-channel counting experiments are discussed in a general context and illustrated with examples. We argue in favor of conservative Bayesian-motivated statistical measures, and as an application we employ these measures to obtain the current lower limits on proton partial lifetime at various confidence levels, based on Super-Kamiokande's data, generalizing the 90\% CL published limits. Finally, we present projections for exclusion and discovery reaches for proton partial lifetimes in $p \rightarrow \overline \nu K^+$ and $p \rightarrow e^+ \pi^0$ decay channels at Hyper-Kamiokande, DUNE, JUNO, and THEIA.


I. INTRODUCTION
In order to account for the observed matter-antimatter asymmetry in our universe, baryon number must be violated as required by the Sakharov conditions [1]. Although baryon number is a global symmetry of the (renormalizable) Standard Model (SM) Lagrangian, it may be violated by non-perturbative electroweak sphaleron effects (as yet unconfirmed by experiment) that are heavily suppressed at temperatures much lower than the electroweak scale [2,3]. The sphaleron effects, however, together with the CP-violation in the electroweak sector are not sufficient to explain the observed baryon asymmetry, and therefore provide a key motivation for theories beyond the SM with additional B-violation. Grand unified theories (GUTs), with or without supersymmetry, are well-motivated and generically predict baryon number violation, and therefore can lead to proton decay . After integrating out the heavy fields, the non-renormalizable operators built out of the SM fields that allow proton decay are of dimension-six or higher, with the suppression scale of order the GUT breaking scale.
In order to project the exclusion and discovery reaches, it is necessary to make choices regarding the statistical tools to be employed. Indeed, the results for such projections are only meaningful in the context of those choices. Here, we are interested in counting experiments with multiple independent channels with different signal rates and backgrounds, with uncertainties.
Our statistical analysis choices are guided by several requirements.
• We aim for statistical measures that avoid reporting an exclusion or discovery when the experiment is actually not sensitive to the physics signal hypothesis under investigation. As we will discuss, pure frequentist statistics can suffer from this problem.
• We choose statistical measures such that the presence of a non-informative channel (one with a much higher background and/or a much lower signal rate than other channels) does not unduly affect the exclusion or discovery conclusion.
• We avoid statistical measures that contain the subtle flaw that they could counterintuitively imply a greater sensitivity for an experiment if it increases its background.
Regarding this last point, in a previous paper [50], we have discussed the fact that the median expected significance for discovery or exclusion has just such a counterintuitive flaw in the context of frequentist p-values for a single-channel counting experiment. We proposed a solution to that problem. As we will see below, this type of problem also occurs in the case of multi-channel counting experiments, and can be avoided using Bayesian-motivated statistical measures. For these reasons, section II of this paper is devoted to a rather extensive discussion of the statistical issues associated with multi-channel counting experiments with background and nuisance parameter uncertainties, in which we highlight some of the problems that can occur and explain our choices of statistical tools in a general context. In Section III we apply these statistical measures to discuss the present exclusions from Super-Kamiokande, and we project exclusion and discovery prospects for proton decay at DUNE, JUNO, Hyper-Kamiokande, and THEIA, for the proton decay modes p → e + π 0 and p → νK + . Section IV summarizes our findings for exclusion and discovery prospects for run-times of 10 and 20 years.

A. Basic definitions
In this paper we are concerned with new physics signals and backgrounds, which are both assumed to occur as random discrete events governed by Poisson statistics, possibly in multiple independent channels. In general, given data resulting from an experiment, the significance of a possible exclusion or discovery can be given in terms of a p-value, defined as the probability of obtaining a result of equal or greater incompatibility with a null hypothesis H 0 . In high-energy physics, the p-value is often conventionally reported as a significance, defined by Z ≡ √ 2 erfc −1 (2p) , (2.1) which in the special case of a Gaussian distribution would coincide with the number of standard deviations.
The assumption for discovery is that the null hypothesis is a background-only hypothesis H 0 = H b , while for exclusion the null hypothesis is a signal plus background model H 0 = H s+b . Consider a test-statistic Q defined in such a way that larger Q is more signal-like and smaller Q is more background-like. In a single-channel counting experiment, for example, Q is simply the number of observed events. Then, for an experimental outcome Q obs , one has the p-value for discovery: (2.2) and the p-value for exclusion: In a frequentist approach, the p-value for a given data outcome is often used to provide a quantitative measure of the credence we give to H 0 . However, the p-value cannot be directly interpreted as the probability that the null hypothesis is true, given the data. Nevertheless, small p-values are considered a measure of evidence against H 0 in frequentist statistics. In particle physics, two popular standards for exclusion are to require that p excl < 0.10 or 0.05, commonly referred to as 90% or 95% exclusion. For rejection of the background-only hypothesis in favor of some new model, a higher standard is almost always required, with either Z disc > 3 (p disc < 0.001350) for "evidence", or Z disc > 5 (p disc < 2.867 × 10 −7 ) for "discovery". In high energy physics experiments in the 21st century, starting with the Higgs boson searches at the LEP e − e + collider and for all kinds of searches for new phenomena at the Large Hadron Collider (LHC), it has become very common to use a modified frequentist statistical measure for exclusion, called the CL s method. This is a more conservative approach to assigning exclusion significances than p excl . The idea of CL s [51][52][53][54] is to divide the usual p-value for exclusion by the p-value that would be obtained with the signal assumed absent: . (2.4) A specific motivation for using CL s rather than p excl is to avoid reporting an exclusion in cases for which the experiment is actually not sensitive to the purported signal hypothesis, but the observed data has a small p-value anyway. This can occur, for example, in a counting experiment if the observed number of events is significantly smaller than the background estimate, as we will discuss in detail shortly. Note that, by design, CL s is not a p-value or even a probability, but rather a ratio of probabilities. Nevertheless, the exclusion is reported using CL s in place of the exclusion p-value, so that one reports 95% (or 90%) exclusion if CL s < 0.05 (or 0.1). Because the denominator is always less than 1, the modified frequentist measure CL s is always more conservative in reporting exclusions than the frequentist p value, in the sense that using it reduces the false exclusion rate compared to using p excl . In particle physics literature, CL s was introduced in ref. [51] and detailed (along with its advantages, reviewed and illustrated below) in refs. [52][53][54].
It is also useful to have a counterpart to the p disc statistic that similarly guards against claiming discovery in situations where the experiment is not sensitive to the signal model. In ref. [55], an approach to discovery significance was proposed using the Bayes factor [56][57][58] of the null hypothesis H 0 = H b to the alternative hypothesis H 1 = H s+b . For an experiment investigating a putative signal with strength s, the Bayes factor B 01 is (using the probabilities in place of the likelihoods, to which they are proportional): ds π(s ) P (Q obs |H s +b ) , (2.5) where π(s ) is a Bayesian prior probability distribution for the signal strength. As mentioned in [55], this expression is only meaningful in the case of a prior that is proper, i.e.
∞ 0 ds π(s ) = 1, since otherwise the arbitrary normalization of an improper prior would make the Bayes factor B 01 also arbitrary. This precludes the use of a flat prior, for example. For a single-channel counting experiment with background mean b, that reference argues in favor of the proper prior π(s ) = b/(s + b) 2 , referred to as the objective signal prior. However, we find it counterintuitive to use a prior for the signal that depends on the background. Instead, we choose simply π(s ) = δ(s − s), expressing certainty in the prediction of the signal model. If the signal model prediction is not perfectly well known, it is straightforward to generalize this with an appropriate π(s ). We therefore define the simple likelihood ratio statistic for the confidence level in the discovery, While various scales have been proposed (see e.g. Jeffreys' in [58] and Kass and Raftery's in [57]) to interpret the Bayes factor as a measure of evidence in favor of or against a null hypothesis, we propose to use CL disc in place of p in eq. (2.1) to obtain a discovery significance Z, in exactly the same way that a frequentist p disc would be used. As we will illustrate below, our choice gives results that are always more conservative than the significances obtained from p disc . This is very similar to the way the modified frequentist measure CL s is now commonly used in place of p in eq. (2.1) to report an exclusion significance that is always more conservative than that of the standard frequentist method, even though CL s , like CL disc , is not a probability.

B. Single-channel counting experiments
To illustrate the statistical methods discussed above let us consider the special case of a simple experiment that counts the number of events n, with signal and background modeled as independent Poisson processes with means s and b respectively. For a mean µ, the Poisson probability to observe n events is Therefore, in the idealized case of perfectly known background, the p-value for discovery is the probability that data generated under hypothesis H 0 = H b is equally or more signal-like than the actual observed number of events n: The p-value for exclusion is the probability that data generated under hypothesis H 0 = H s+b is equally or more background-like than the actual observed number of events n: In these equations, γ(z, x) and Γ(z, x) are the lower and upper incomplete gamma functions, respectively, defined by so that Γ(z) = γ(z, x) + Γ(z, x) is the ordinary gamma function. The CL s statistic for exclusion in this case is This is larger than p excl (n, b, s) by a factor Γ(n + 1)/Γ(n + 1, b). Figure 2.1 illustrates the idea of the CL s method [51][52][53][54]. In the figure, p excl (n, b, s) (the shaded area under the blue histograms) is divided by p excl (n, b, 0) (the shaded area under the red histograms) to give CL s . The first panel shows the case b = 2.2, s = 8.4, and n = 5. In situations like this, where the H b and H s+b hypothesis distributions do not have much overlap, p excl and CL s evaluate to very similar results due to the denominator of the CL s definition being close to 1. For this particular case, one finds p excl = 0.0475 and CL s = 0.0487, and by either criterion one would report a better than 95% exclusion.
The second panel of Figure 2.1 illustrates the case b = 8.4, s = 2.2, and n = 5, so that the overlap between the distributions for H b and H s+b is much larger. In cases like this with a larger overlap (i.e. the signal regions get polluted by the background) statistical conclusions based on p excl alone can be too aggressive. Since we engineered this example to have the same b + s and n as for the first panel, we get the same † p excl = 0.0475, which taken at face value would again give a better than 95% exclusion. However, proponents of the CL s criteria point out that here it must be recognized that for b = 8.4, the outcome n ≤ 5 would have been a low-probability occurrence no matter what ‡ the signal mean s was. Thus, the frequentist p excl is really telling us more about the observed data than making a useful statement about the signal hypothesis. One finds that CL s = 0.3022, and using this one would, sensibly and conservatively, refrain from excluding the signal hypothesis.
In fact, no matter the outcome for n, the experiment with b = 8.4 simply lacks the statistical ability to exclude the s = 2.2 signal model at 90% confidence, according to the CL s statistic. This can be seen by computing it for the least signal-like outcome, n = 0, which gives CL s = 0.1108. One possible practical interpretation of the very small p excl in such cases with n significantly less than b might be that the background estimate could be wrong for reasons unknown, while another is that the background simply fluctuated low from its true mean. In any case, the intuitive interpretation of the CL s statistic is that the quoted significance for exclusion should be reduced from the usual frequentist value, due to the large overlap between the signal+background region and the background-only region.
Indeed, if the number of events is sufficiently small, one finds that the usual frequentist p-value would correspond to an exclusion even in cases that defy sensible practical interpretation. Considering the case n = 0 more generally, one finds p excl (n = 0, s, b) = e −(s+b) , which becomes arbitrarily small for any fixed s, if b is sufficiently large. One could use this to make an absurd claim of exclusion for a model that predicted s = 10 −500 or even s = 0 exactly, simply by observing a smaller than expected number of events, if the background is large enough! In contrast, usage of the statistic CL s (n = 0, b, s) = e −s conforms to the intuitively reasonable idea that, as an absolute prerequisite for excluding a signal hypothesis, the expected signal strength must not be too small. Specifically, only models that predict s > − ln(0.05) ≈ 2.996 can be excluded at 95% confidence according to the CL s measure, for any b and for any possible experimental outcome n. Similarly, 90% exclusion by the CL s method requires s > − ln(0.1) ≈ 2.303. † The general fact that p excl (n, b, s) depends only on the sum s + b, and not on s or b separately, is a clear reason to reject it as a measure of confidence in the presence of the signal model, because it says that any exclusion for signal s and background b would imply an equally strong exclusion for the case that the signal is s = 0 if the background b were increased by the numerical value of s. ‡ Here we are taking it as a requirement that s ≥ 0, although in some situations quantum interference with the background could allow for s < 0. See, for example, the case of a digluon resonance at the LHC [59]. In both plots, the observed number of events is n = 5. In the first plot, there is little overlap between the distributions from the H b and H s+b hypotheses, and p excl = 0.0475 and CL s = 0.0487, so one would report better than 95% exclusion using either criterion. In the second plot, the overlap is much larger. Although p excl = 0.0475 is the same (since s + b and n did not change), one finds CL s = 0.3022, and one refrains from reporting an exclusion of the hypothesis H s+b . The dependence of the exclusion significance on b is shown for fixed s = 4 and n = 0, 1, 2, 3 in Figure 2.2. For very small b, the two statistics are nearly equal, p excl CL s . For any fixed n, in the limit of large b one has CL s = e −s , while p excl becomes absurdly small in comparison, which would imply an absurdly large Z excl .
Non-observation of a significant excess above background expectations can be used to constrain new physics. In particular, for a single-channel counting experiment, the minimum signal needed to claim an exclusion at a given confidence level 1−α, equivalent to significance Z = √ 2 erfc −1 (2α), for a perfectly known background mean b, is obtained [60,61] by solving for s in either α = Γ(n + 1, s + b) Γ(n + 1) (p excl method) (2.12) in the standard frequentist approach, or in the modified frequentist approach.  distributions. The solid black lines in Figure 2.3 show the results obtained by the FC method after requiring them to be non-increasing as a function of background mean. It is clear from the figure that the upper limits on s obtained using the standard frequentist p excl approach are the least conservative, and can even go negative in the case where the number of observed events n is small compared to the expected background mean. For a fixed n, despite the upper limits given by the CL s and FC methods being very different from each other, we note that they are both almost flat at very small backgrounds and then decrease slowly (or stay constant) as a function of background, always remaining positive. For small b the FC upper limits are more conservative, and for large b, the CL s upper limits are more conservative. The other striking difference between these two upper limits is that, for n = 0, the FC upper limits decrease with b, but the CL s upper limits are independent of b. In particular, at a chosen confidence level 1 − α, for n = 0 the CL s upper limit on s is − ln(α). The same result also holds for any n in the limit that the background is extremely large. At 90% (95%) CL, the upper limit given by CL s for n = 0, or for any n as b → ∞, is around 2.303 (2.996). On the other hand, the upper limit given by the FC method decreases as a function of b and approaches a constant value at large b. For example, for n = 0, the 90% (95%) CL upper limit given by the FC method, after requiring it to be non-increasing as a function of b, is approximately 0.8 (1.34) at large b.
It is important for the following that the result for CL s (n, b, s) in the case of a single Poisson channel in eq. (2.11) can also be obtained [63] as a Bayesian credible interval, using a flat prior for the signal and likelihoods L(s|n, b) ∝ P (n|s + b): (2.14) Performing the integrations, CL excl (n, b, s) as defined by eq. (2.14) is precisely equal to CL s (n, b, s) as defined by eq. (2.11). § However, despite the numerical equivalence, the interpretation is quite different, since the ratio of frequentist p-values is not directly a Bayesian confidence interval. Moreover, the equivalence between CL s and CL excl is only approximate in more complicated generalizations. Looking ahead to the case of experiments which collect counts in multiple independent channels governed by Poisson statistics, and which may have nuisance parameters including uncertainties in the backgrounds, we will argue for a generalization based straightforwardly on the Bayesian version CL excl as given in eq. (2.14) rather than CL s given in eq. (2.4) or its specialization eq. (2.11). For a single-channel counting experiment, the discovery confidence level statistic defined in eq. (2.6) becomes which can be used in place of p in eq. (2.1) to obtain a discovery significance. (If the result is greater than 1, then clearly no discovery claim should be contemplated.) Note that unlike p disc (n, b), the result for CL disc (n, b, s) depends on the strength of the signal whose discovery is under investigation. It is always more conservative than p disc (n, b) in claiming discovery, just as CL s is more conservative than p excl in claiming exclusion. For example, in the extreme case s = 0, one has CL disc (n, b, s = 0) = 1 for any b and n, so one would never claim discovery using that criteria. In contrast, the frequentist statistic p disc (n, b) can be arbitrarily small, implying an arbitrarily large discovery significance Z, even in situations where the physics provides absolutely no possible source for a signal. ¶ § If the signal mean is instead allowed to be negative with s + b ≥ 0 (see previous footnote), then  As we will see below, CL disc also generalizes more straightforwardly to cases that have multiple independent channels governed by Poisson statistics, and which may have nuisance parameters including uncertainties in the backgrounds. Figure 2.4 compares the discovery significance obtained from p disc and CL disc as a function of s for fixed n, with different curves for different values of b. Note that the discovery significance obtained from CL disc , which is always more conservative than that of p disc , is maximized at s = n − b.
Given the number of observed events n and an expected background mean, the standard pvalue for discovery p disc does not depend on the signal. So, for a perfectly known background mean b, we can compute the number of events needed for discovery at a significance Z by solving for n from [see eqs. (2.1) and (2.8)] On the other hand, CL disc depends also on the signal, in which case the number of events needed for discovery for a known background b and signal mean s at a given significance Z can be obtained by solving for n from [see eqs. (2.1) and (2.15)] Z = 5 discovery (right panel) given by the p disc approach (solid black lines), and the CL disc approach for two choices of the signal mean s = 2 (dashed red lines) and 10 (dashed blue lines) as functions of b. It is clear from the figure that, for a given background mean, the observed number of events needed for discovery given by the CL disc approach are at least as large as the result given by the p disc criterion, and often much larger when the background is not very small. We now turn to the question of projecting expectations for exclusion and discovery at ongoing and future experiments. In simulations or assessments of a proposed experiment, one considers the statistics of pseudo-data generated under an alternative hypothesis H 1 . For assessments of prospects for exclusion the alternative hypothesis is that the signal source is absent, H 1 = H b , while for discovery, the pseudo-data is generated assuming that both signal and background are present, A common way to project an expected result is to set the number of events n equal to the median expected value under the hypothesis H 1 . However, due to the discrete nature of Poisson statistics events, the median expected outcome has the striking flaw that it can predict smaller significances if an experiment takes more data or reduces its background. This counterintuitive feature of the median expected significance was pointed out and studied in detail in refs. [64,65], and in [50] where it was referred to as the "sawtooth problem". It occurs for the median expected CL s and CL disc as well. The sawtooth behavior of the median expected CL s and CL disc as a function of the background mean b, for various values of signal Therefore, in ref. [50], we proposed instead to use an exact Asimov approach for projecting sensitivities of planned experiments, where the observed number of events n is replaced by its mean expected value n excl = b for exclusion and n disc = s + b for discovery. From eqs. (2.9) and (2.11) we thus obtain for the expected exclusion in the case of a single-channel counting experiment with signal and background means s and b: Similarly, for the expected discovery significance, we obtain from eqs. (2.8) and (2.15): (2.21) Figure 2.7 compares the exact Asimov expected significances obtained from frequentist (dashed lines) and modified frequentist CL s /Bayesian CL disc (solid lines) confidence levels, for both exclusion (left panel) and discovery (right panel) cases. This illustrates the more general fact that CL s and CL disc are more conservative than p excl and p disc , respectively. In order to project expected exclusions based on the p excl or CL s approaches, we set eq. (2.18) or (2.19) equal to the desired α = 0.10 or 0.05, and then solve for s. We also consider projections based on the FC method, in two different ways. One is the Feldman-Cousins experimental sensitivity, advocated within ref. [62], that is defined as the arithmetic mean of the upper limits obtained by the FC method at a chosen confidence level * * s UL FC (n, b) in a large number of pseudo-experiments with data generated under background-only hy- The other way is to simply compute the upper limit on signal given by the FC method with the observed number of events taken to be the nearest integer to the expected background mean n = round(b) ‡ ‡ . We consider the latter for future reference, as it was alluded to in ref. [66] while projecting exclusion sensitivity for proton decay in p → νK + channel at DUNE. In Figure 2.8, we compare the expected 90% CL (left panels) and 95% CL (right panels) upper limits on the signal mean s, obtained using the exact Asimov CL s (blue lines) and p excl (red lines), FC experimental sensitivity § § (green lines), and FC upper limit with n = round(b) (black lines). We note the following from the figure. First, unlike the case with the observed upper limits (i.e. fixed n), the p excl method gives sensible positive expected upper limits with the exact Asimov approach for all b, but still is less conservative than the CL s and FC sensitivity results. Second, the upper limit given by the FC method with n = round(b) suffers from a sawtooth problem and is therefore counter-intuitive and flawed as a method of comparing experimental prospects for different scenarios, as it implies that an experiment could become more sensitive if it had larger background. Finally, the FC sensitivity and the upper limits given by exact Asimov CL s are both sensible as they increase monotonically with b, and are also comparable at small backgrounds. At large backgrounds, however, the FC sensitivity is slightly more conservative. We also note that CL s upper limits are much easier to evaluate than the FC upper limits.
We now turn to the issue of prospects for discovery, using the exact Asimov criterion. The signal mean needed for an expected discovery at a significance Z is given by the solution for s in setting eq. (2.20) for p disc , or (2.21) for CL disc , equal to 1 2 erfc Z √ 2 for the desired Z. Figure 2.9 compares the signals s needed for an expected Z = 3 evidence or Z = 5 discovery, as a function of background mean b, based on p A disc and CL A disc . We note that as expected the results from CL A disc are more conservative than those obtained from p A disc . For very small b, note that for Z = 3 the s needed in Figure 2.9 is actually less than 1.  Here, it is important to note that the discovery statistics p disc and CL disc are not well-defined in the strict background-free limit b → 0. Specifically, Since n disc = s for b = 0, the above implies that the exact Asimov expected discovery significances are both infinite, Z(p A disc ) = Z(CL A disc ) = ∞, for any non-zero s (however small). However, as a practical matter, it is clearly unreasonable to suggest an expectation of a discovery if the mean expected number of signal events is much less than 1. Therefore, in order to be conservative, in cases with an extremely small background we can impose an additional requirement that P (n ≥ 1) should be greater than some fixed value in order to claim an expected discovery. Figure 2.10 shows the probability of observing at least one event,

C. Exclusion for multi-channel counting experiments
Consider a counting experiment with N independent channels. For each channel i = 1, . . . , N , the background and possible signal are assumed to be governed by Poisson distributions with means b i and s i . For future convenience, define so that s is the total mean expected signal in all channels, and the r i are the expected fractions of the total signal events for each channel.
Given an observation where the sums over non-negative integer numbers of events {k i } are restricted according to the condition that where Q is an appropriately chosen test-statistic with the property that larger Q is more signal-like. We can also compute: with the same restrictions on k i as in eq. (2.29). Then we have which is interpreted as the confidence level in the hypothesis that the signal is present.
For the single channel case, the obvious choice for Q is the observed number of events, but in the multi-channel case one can consider different choices for Q. A simple and good choice * * * of test-statistic Q is the likelihood ratio, It is more convenient to use instead which gives exactly the same results for p excl and CL s as Q = q, since ln(q) increases monotonically with q. The contribution −s is an irrelevant constant (independent of the * * * There are other choices, including the profile likelihood ratio, but these are more complicated and end up giving very similar (and often identical) results.
data {n i }), so the use of Q = ln(q) amounts to taking the sum of the individual n i 's, but weighting each of the channels by the factor w i = ln(1 + s i /b i ). This means that, using eq. (2.34) in eq. (2.29), the restriction on the {k i } appearing in the sums in eqs. (2.28) and (2.30) becomes: In contrast, the Bayesian way is to define, as a generalization of eq. (2.14): Unlike in the special case of a single channel, CL excl ( n, b, s) defined in this way is not exactly equal to CL s ( n, b, s) defined by eq. (2.31). Therefore, we will now study some simple test cases to illustrate the differences. First, let us consider what happens when there are two channels, one of which (the "bad", or non-informative channel) has a much lower signal and higher background than the other (the "good channel"). As a specific numerical case, suppose:  Figure 2.11. Counterintuitively, adding another channel with a larger background and almost no expected signal has increased our confidence in the exclusion as measured by either the frequentist p excl or the modified frequentist CL s measures, when n 2 is small. In contrast, CL excl behaves as intuitively expected; the result obtained including both channels is numerically almost independent of n 2 and almost identical to the result obtained only from channel 1.
To understand the origin of this counterintuitive effect for p excl and CL s , let us consider which integers k 1 , k 2 contribute to the sums in eqs. (2.28) and (2.30). In general, k 1 = 0 and 1 each contribute for a very large range of k 2 , so that very nearly we have a factor ∞ k 2 =0 P (k 2 |s 2 + b 2 ) ≈ 1 for channel 2 in eq. (2.28). However, for k 1 = n 1 = 2, we only get a factor of n 2 k 2 =0 P (k 2 |s 2 + b 2 ) < 1 contributing to the p-values. The problem boils down to this fact: for the contributions with k 1 = n 1 , only a subset of the k 2 values contribute, even though any result for k 2 should give us essentially no information about the presence of the (tiny) signal. This explains why the counterintuitive problem disappears for reasonably large n 2 , where we see from the left panel of Figure 2.11 that CL excl ≈ CL s and p excl agree with their counterparts from channel 1 only.
To show another facet of this disturbing effect, in the right panel of Figure 2.11 we use the same data except that n 2 = 0 is fixed and b 2 is varying. Again, we see that despite channel 2 containing essentially no information about the signal, the modified frequentist CL s including both channels depends on b 2 , while CL excl is almost exactly flat, conforming to intuitive expectation. Another study case is shown in the first panel of Figure 2.12, with: for arbitrarily small but non-zero. This discontinuity can be traced to the fact that for s 2 = 4 exactly, the weights satisfy w 1 = w 2 exactly for the two channels, which affects which integers are summed over due to eq. (2.35). There are also discontinuities in CL s at s 2 = √ 5 − 1 ≈ 1.23607, where w 1 = 2w 2 , and at s 2 = 5 1/3 − 1 ≈ 0.709976, where w 1 = 3w 2 , etc.
For another case study, consider: The results are depicted in the second panel of Figure 2.12, and show more pronounced discontinuities in both frequentist p excl and the modified frequentist CL s . In contrast, the Bayesian result CL excl is smooth as we vary s 2 , and gives more conservative exclusion significances.
Let us now consider the question of projecting expected exclusion significances for future experiments. In the multi-channel case, one can define Asimov results for p excl and CL s by replacing each n i in eqs. (2.28) and (2.31) by the mean expected result b i in the restriction eq. (2.35). However, in the multi-channel case, the resulting sets of {k i } that contribute to the sums will depend discontinuously on the {s i } and {b i }, leading to the same sort of sawtooth problems that occurs in the median expected significance. In particular, an increase in the backgrounds often leads, counterintuitively, to a larger expected significance. (This problem did not occur in the single-channel case, because the sum n k=0 was evaluated in closed form in terms of incomplete Γ functions, after which the argument n could be interpreted as a continuous real number rather than an integer.) In contrast, if one uses CL excl ( n, b, s), then the exact Asimov method is perfectly straightforward and continuous, since it does not involve sums over integers subject to restrictions. Thus one can simply replace n i by b i in eq. (2.36) to obtain the exact Asimov result. The Asimov results for p excl , CL s , and CL excl are compared in Figure 2.13 for two test cases, showing the sawtooth behavior of the first two and the smooth, monotonic (and more conservative) behavior of the latter.
In view of the preceding discussion, we propose CL excl in eq. (2.36) as the preferred statistic for exclusion for multi-channel counting experiments. Unlike p excl and CL s (with which it coincides in the single-channel case), it does not suffer from the problem of being affected significantly by the presence of a bad channel, and does not have discontinuities when signal and background means are changed infinitesimally. The exact Asimov result is straightforward to obtain and behaves continuously and monotonically in the expected way with respect to changes in the background. Furthermore, the introduction of background uncertainties and probability distributions for nuisance parameters is more straightforward, avoiding discontinuities in the integrand, as we will see below.

D. Discovery for multi-channel counting experiments
For the discovery case, the frequentist p-value is defined by The sum over {k i } is restricted by the condition that the test-statistic ln(q) defined by eq. (2.34) is not smaller for {k i } than for the observed data {n i }, so: Unlike the single-channel special case, p disc depends on the signal strengths s i when there is more than one channel because of this restriction. Note that the inequality has the opposite sense compared to the exclusion case, eq. (2.35). A more conservative, and simpler, alternative to p disc ( n, b, s) is the generalization of eq. (2.15), . (2.49) In order to compare these criteria for discovery, we first consider a case with one good channel and one bad channel, starting from the following values: In Figure 2.14, we show the results for the discovery significance Z obtained from p disc and CL disc , considering variations in both n 2 and b 2 as the other quantities are held fixed, and compare to the same results using only channel 1. As in the exclusion case, we note that p disc is affected in a non-trivial way by the presence of the bad channel, contrary to intuitive expectations. The step function discontinuities in p disc are not a numerical artifact, but occur at values of b 2 such that the ratio of weights w 1 /w 2 = ln(1 + s 1 /b 1 )/ ln(1 + s 2 /b 2 ) is a rational number, so that the integer number of terms appearing in the {k i } in eq. (2.47)  changes discontinuously. In contrast, CL disc is seen to be much less affected by the presence of the bad channel. The reason for this is that for any channel i with very small s i , the numerator and denominator factors for that channel will cancel in the limit s i /b i → 0 in eq. (2.49). The exception (in the right panel of Figure 2.14) occurs in the case that b 2 is also small, in which case n 2 = 10 is a surprising outcome for both the background-only and background+signal hypotheses.
Further comparisons between the significances obtained from p disc and CL disc for two test cases are shown in Figure 2.15. The results obtained from p disc have numerous discontinuities, which are small numerically but have the disturbing property of being non-monotonic as the background b 2 is varied. The results from CL disc are reliably more conservative, as we have already noted, and do not suffer discontinuities because there is no restricted sum over integers in its definition.
For the purpose of projecting discovery prospects in future experiments, one can again define the Asimov values of p disc and CL disc by replacing n i with b i + s i in eqs. (2.47) and (2.49) respectively. These are compared for two test cases in Figure 2.16. In the case of p disc , the constraint put on the sum by eq. (2.48) leads to a non-monotonic sawtooth behavior, although much less pronounced than in the exclusion case in Figure 2.13.
For the reasons just discussed, and because of the ease of generalization to the case of background uncertainties as discussed in the next section, we propose to use CL disc as the figure of merit for the significance of a possible discovery, and for projecting the discovery reach of future experiments. Asimov p disc has a counterintuitive non-monotonic behavior as the first channel background mean b 1 is varied, while the Asimov CL disc is monotonic in the expected way, and more conservative.

E. Background uncertainty and other nuisance parameters
In the real world, the background level is never perfectly known. Furthermore, the background and signal may depend on other nuisance parameter(s), to be called ν below. These can be dealt with in a Bayesian approach by assuming probability densities f (b) and g(ν), subject to the normalization conditions ∞ 0 db f (b) = 1 and dν g(ν) = 1. For example, following [50], we can model the background uncertainty in terms of an on-off problem [68][69][70][71][72][73], where m is the number of Poisson events in a signal-off (background-only) region, and the ratio of background means in the signal-off and signal-on regions is called τ . In terms of m and τ , the point estimate for the background and its variance arê b = m/τ, or equivalently so that the probability density of b is the posterior probability distribution for b obtained by using Bayes' theorem with Poisson likelihood for background in the signal-off region P (m|τ b) and flat prior for b. Note that this probability distribution can be used as a model even in situations where the estimates of the background and its uncertainty come partly or completely from theory rather than some signal-off region data.
In the case of eq. (2.54), the probability for observing n events in the signal-on region is obtained by averaging over b [71][72][73][74][75] to obtain We can then extend the definitions of frequentist p-values and to the uncertain background case by simply replacing the Poisson probability P (n|s + b) with ∆P (n,b, ∆ b , s) [51]: Explicit formulas for ∆P (n,b, ∆ b , s), p excl (n,b, ∆ b , s), and p disc (n,b, ∆ b ) can be found in eqs. (12)-(15) of ref. [50]. Besides these, we note the simple formula: . (2.58) Similarly, the confidence levels discussed in the previous sections can be obtained in the uncertain background case as Note that we retain the property CL excl = CL s in the single-channel case with non-zero background uncertainty. The exact Asimov expectations for p excl , CL s = CL excl and p disc , CL disc in the uncertain background case are obtained by replacing n in the preceding equations by its expected mean in each case: for exclusion and discovery, respectively. More generally, for any probability distributions f ( b) and g(ν) for the background and other nuisance parameters, one can marginalize (integrate) over b i and ν. In the case of exclusion, eq. (2.28) generalizes to and similarly for eq. (2.30), which then gives CL s . However, note that the sum {k i } is subject to the restriction eq. (2.35), so that even the numbers of terms in the sum depends in a discontinuous way on ν and b i as we integrate over them in the multichannel case. Ref. [76] contains a discussion of various ways to account for the uncertainties in the background and nuisance parameters in the frequentist methods. As argued above, we prefer instead to generalize eq. (2.36), resulting in: Here we have used a short-hand notation to be used several times below, such that the normalization factor D is equal to the expression that follows it with s = 0. Similarly, in the case of discovery in the presence of background uncertainties and nuisance parameters, we can generalize eq. (2.47) to obtain this time subject to the constraint eq. (2.48) on the terms in the sum. However, as argued above, we prefer to use the more conservative .

(2.67)
To obtain the Asimov results, one can substitute in the mean expected values for n i , namely

III. APPLICATION TO PROTON DECAY
In this section, we will first consider the application of the Bayesian statistic CL excl to estimate the current lower limits on proton partial lifetimes in p → νK + and p → e + π 0 modes, based on Super-Kamiokande's data, at various confidence levels generalizing the 90% CL published limits. We will then consider the prospects for exclusion or discovery of these proton decay modes for several planned future neutrino experiments: DUNE [46], JUNO [47], Hyper-Kamiokande [48], and THEIA [49]. We do this by applying the Bayesian approach of using CL excl and CL disc with the exact Asimov criterion of replacing the observed counts by their respective expected means.
As discussed above, the Bayesian approaches CL excl for exclusion and CL disc for discovery are ideal methods to obtain these limits and projections, as they: 1) guard against claiming exclusion (or discovery) when an experiment is actually not sensitive to the signal model, and therefore are more conservative than the frequentist p excl and p disc ; 2) are well-behaved in multi-channel counting experiments in the sense that, unlike the (modified) frequentist approach, CL excl and CL disc are not overly affected by the presence of non-informative channels and do not have any discontinuities as the signal and background means are varied; 3) are easily able to include uncertainties in the backgrounds and the signal selection efficiencies, especially for multi-channel counting experiments.
The estimates for the backgrounds and the signal selection efficiencies in a specific proton decay mode have been obtained by the DUNE, JUNO, and THEIA collaborations by modeling the experiments as single-channel counting experiments, whereas Hyper-Kamiokande searches for proton decay are modeled as multi-channel counting experiments based on the signal regions and search strategies used at Super-Kamiokande. Before we present our results, we first review the methods we employ to obtain the limits/projections for proton partial lifetimes at single-channel and multi-channel counting experiments, based on the methods elucidated in section II.
The number of decays in a specific decay channel at an experiment with N 0 initial number of protons for a runtime of ∆t is given by where the proton partial width Γ is extremely small. (More generally, ∆N = N 0 (1−e −Γ∆t ).) Therefore the signal can be computed as where 0 ≤ ≤ 1 is the signal selection efficiency. In terms of the number of protons per kiloton of detector material N p and the exposure λ (= runtime × number of kilotons of detector material) of the experiment in units of kiloton-years, we can reexpress eq. (3.2) as The present exclusion limit at confidence level 1 − α for the proton partial lifetime is then provided by [77] τ p = 1/Γ = N p λ/s, (3.4) where s is the number of signal events that gives CL excl equal to α. For a future experiment, the exclusion reach for the proton partial lifetime at confidence level 1 − α is given by the same formula eq. (3.4), where s is now the signal that makes the exact Asimov CL A excl equal to α. The discovery reach for a given significance Z is likewise obtained from eq. (3.4) using the s that provides for CL A disc = 1 2 erfc(Z/ √ 2). Eq. (3.4) holds for an experiment with a single search channel with known background b and signal selection efficiency . For the more general case of an experiment with one or more independent search channels with possibly uncertain backgrounds and signal efficiencies, we employ a Bayesian approach to obtain the limit/reach for proton partial lifetime, as discussed above. First, for the exclusion case, given the number of observed events n i in each search channel labeled i, the upper limit on proton partial width at a confidence level 1 − α is obtained by solving for Γ in (see eq. (2.65), and ref. [78]): Here, D is a normalization factor, defined to equal the expression that follows it evaluated at Γ = 0, and in each search channel labeled by i, the signal rate is (3.6) and g( i ) and f (b i ) are the probability distributions for the signal efficiency i and the background b i . These distributions can take different forms to parameterize our lack of perfect knowledge of the efficiency and background, such that 1 0 d i g( i ) = 1 and ∞ 0 db i f (b i ) = 1. For example, the probability distribution of true signal selection efficiency i might be taken to be a truncated Gaussian distribution with central valueˆ i and variance ∆ i , as in the Super-Kamiokande search analyses in refs. [44,45]: The probability distribution of true background can be taken to be given by eq. (2.54) as in the on-off problem, in terms of quantities m i and τ i , related to the central valueb i and variance ∆ b i by eq. (2.52). Eq. (3.5) assumes that the search channels are independent. If the background and the signal selection efficiencies are perfectly known, i.e.
This corresponds to eq. (2.36). Specializing further to a single search channel (dropping the subscript i), this reduces to eq. (2.14) with s = N p λΓ . For projecting the exclusion reach for partial lifetime at future experiments, we make use of the exact Asimov method by replacing the number of events n i in each search channel by their respective expected means, for example b i = (m i + 1)/τ i =b i + ∆ 2 b i /b i if the on-off problem treatment is used for the background. The expected confidence level 1 − α upper limit on partial width Γ is then solved from eq. (3.5) with n i replaced by b i : Equation (3.10) gives the Asimov expected lower limit on the partial lifetime via τ p = 1/Γ. For the expected discovery reach for proton partial widths at future experiments, we use a method based on the exact Asimov evaluation of the statistic CL disc . In particular, we solve for Γ from (see eq. (2.67)) , (3.11) where s i = N p λ i i Γ and n i = s i + b i , with b i as given in eq. (3.9), and This gives the expected discovery reach for partial lifetime using τ p = 1/Γ corresponding to a chosen significance Z. Based on Super-Kamiokande's data, taken from refs. [44,45], which we quote for completeness in Table 3.1, we now compute the upper limit on proton partial widths in the p → νK + and p → e + π 0 decay modes that are excluded at various confidence levels (e.g. 95%, 90%, 68%, 50% CL) using eq. (3.5), which can then be translated into corresponding lower limits on the proton partial lifetime. Super-Kamiokande uses a water Cerenkov detector with a fiducial mass of 22.5 ktons, and the analysis for p → e + π 0 in ref. [45] also includes data from an enlarged fiducial mass of 27.2 ktons. While Super-Kamiokande can probe for proton decay in both p → νK + and p → e + π 0 decay modes, it is less sensitive to the former decay mode, because the K + is produced below its Cerenkov threshold in water and is only identified from its decay constituents. Figure 3.1 shows our own computed estimates of the current confidence levels for the exclusion of proton decay at Super-Kamiokande in p → νK + (left panel) and p → e + π 0 (right panel) channels as a function of proton partial lifetime in the respective decay channels. This generalizes the results presented by the Super-Kamiokande collaboration, which gave results only for 90% CL exclusions. From the TABLE 3.1: Super-Kamiokande's data for p → νK + and p → e + π 0 decay modes, taken from refs. [44] and [45], respectively. In each decay mode, the exposures λ i in kton-years, total backgroundsb i ± ∆ b i , signal efficienciesˆ i ± ∆ i , and the observed number of counts n i are listed. s 90CL i are the expected signal events, defined in eq. (3.12), for proton partial lifetime set equal to its 90% CL lower limit. The last column gives a brief description of each of the channels referring to the detector period (SK I-IV) and the name of the search method used in refs. [44,45].  Table 3.1, we estimated the current lower limits on proton partial lifetimes to be (3.14)

Decay mode
In comparison, the published 90% CL exclusion limit on proton partial lifetimes from the Super-Kamiokande collaboration are τ p /Br(p → νK + ) > 5.9 × 10 33 years at 90% CL (SuperK 2014 [44]), (3.15) τ p /Br(p → e + π 0 ) > 2.4 × 10 34 years at 90% CL (SuperK 2020 [45]). (3.16) shown as the shaded red regions in Figure 3.1. We see that in the case of p → νK + , our † estimate for the 90% CL limit is slightly stronger (6.6 × 10 33 years rather than 5.9 × 10 33 years) than the journal published limit in ref. [44]. In this paper, we only consider the limits from data published in journal articles. In the case of p → νK + , there is more data from the continuation of run SK-IV, which was not used for the published limit in ref. [44]. It is therefore quite possible that a future limit, based on data already taken, will be stronger.
In the case of p → e + π 0 , our estimate for the 90% CL limit agrees perfectly with the Super-Kamiokande published limit in ref. [45]. We now discuss projections for exclusion and discovery of proton decay at possible future neutrino detectors DUNE, JUNO, Hyper-Kamiokande, and THEIA. Both DUNE and JUNO will be primarily searching for proton decay in p → νK + decay mode. For these searches, DUNE uses its far detector with a total of 40 kiloton (kton) fiducial mass of liquid argon [46] and can track and reconstruct charged kaons with high efficiency, and JUNO uses its central detector with a 20 kton fiducial mass of a liquid scintillator [47]. On the other hand, Hyper-Kamiokande [48] uses a water Cerenkov detector with 186 ktons of fiducial mass and is sensitive to both p → νK + and p → e + π 0 decay modes among others. As was the case with Super-Kamiokande, Hyper-Kamiokande will be more sensitive to the p → e + π 0 mode, compared to the p → νK + mode, due to much better reconstruction of the Cerenkov rings of the positron and the electromagnetic showers emanating from π 0 → γγ. THEIA is a † Besides using the probability distribution for true background as in the on-off problem (eq. (2.54)), we have considered various other distributions such as a Gaussian, and a convolution of Gaussian and Poisson (only for search channels with extremely low backgrounds) as done in refs. [44,45], but there was no noticeable change in our results. In Super-Kamiokande's analysis for p → νK + decay mode in ref. [44], the search channels with large backgrounds that are referred to as "p µ spectrum" in Table 3.1 were further divided into sub-channels, but due to insufficient data made available, we are not able to include that subdivision.  Table 3.1. The red shaded regions correspond to Super-Kamiokande's published exclusions on proton partial lifetimes at 90% CL, from [44] and [45].
new detector concept with water-based liquid scintillator (10% liquid scintillator and 90% water) that will be able to detect and distinguish between the Cerenkov and the scintillation light [49]. Here, we project sensitivities for both THEIA-25 and THEIA-100 with fiducial masses 17 and 80 ktons, respectively, that were considered in ref. [49]. Due to the ability to detect scintillation signals from charged particles such as K + produced below its Cerenkov threshold, and Cerenkov signals, the THEIA detector aims to have enhanced sensitivity to the p → νK + mode [49] while also being able to probe the p → e + π 0 mode [79]. The numbers of protons per kiloton of detector material are  For the purposes of projecting sensitivities for THEIA and JUNO, we took the liquid scintillator in both detectors to have 6.75 × 10 33 protons per 20 kilotons based on ref. [47]. Figure 3.2 shows the runtimes at DUNE that are required for an expected 90% CL exclusion (first panel) and Z = 3 evidence (second panel), in the p → νK + decay mode, as a function of the background rate per megaton-year of exposure. The colored lines and bands correspond to various choices of proton partial lifetimes. For the purposes of illustration, we chose a signal selection efficiency = 40 ± 10% that is plausible, based on various signal selection efficiencies that are considered in refs. [66,[80][81][82][83][84]. The solid lines in the figure assume = 40%, and the shaded bands surrounding them vary by ±10%. The required runtimes ∆t in the figure are obtained using eq. (3.4), which gives ∆t = sτ p N p N kton , (3.18) where N kton is the number of kilotons of detector material, and s is the upper limit on signal for 90% CL exclusion obtained from setting CL A excl (as in eq. (2.19)) equal to 0.1, or the signal needed for Z = 3 evidence obtained from setting CL A disc (as in eq. (2.21)) equal to 0.00135. As discussed at the end of Section II B, the zero background limit for the discovery case is not well defined, in a sense that at b = 0, any non-zero signal, albeit arbitrarily small, would yield an infinite significance. Therefore, to be conservative, we require that the mean expected number of signal events s is at least 1 in order to have an expected discovery. The dashed lines for very small b/Mton-year in the lower left corner of the bottom panel (for discovery case) of Figure 3.2 correspond to this additional requirement that s ≥ 1. It is clear from the figure that if the estimated background per megaton-year of exposure at DUNE increases, the required runtime increases more steeply for discovery than for exclusion.
In Figure 3.3, we show the expected 90% CL exclusion reach (first panel) and the expected Z = 3 evidence reach (second panel) for proton partial lifetime in p → νK + decay channel at DUNE as a function of the runtime in years. The three colored lines/bands correspond to various assumed background rates per megaton-year of exposure taken from refs. [46,66,[80][81][82][83][84]. The signal selection efficiency is again taken to be = 40% (solid colored lines) ± 10% (shaded bands). The signals computed from setting eq. (2.19) equal to 0.1, and eq. (2.21) equal to 0.00135, are plugged into eq. (3.4) to obtain the expected 90% CL exclusion, and Z = 3 evidence, reaches for proton partial lifetime, respectively. The black dashed curves correspond to a very optimistic scenario with b = 0 and = 46% [66], and using the requirement s = 1 in the discovery case (bottom panel). Also shown in Figure 3.3 and other figures below are horizontal lines at our previously mentioned estimates of the current 95%, 90%, 68%, and 50% CL exclusion limit based on Super-Kamiokande's data from 2014 [44].
The usual standard for discovery in particle physics is a significance of Z = 5. Therefore, we show in Figure 3.4 the expected reach for Z = 5 in the p → νK + channel at 40 kton DUNE, as a function of the runtime. We note that even after 25 years, the discovery reach in this channel with nominal background rates remains below the value of τ p (p → νK + ) that we estimate to be excluded at 50% CL by the Super-Kamiokande data already published in 2014. Of course, a 50% CL exclusion is far from definitive, but this indicates the challenge being faced. This could change if the background can be reduced to near 0, as indicated by   the dashed line, while maintaining a high efficiency for the signal. As noted above in Section II B, if the mean expected number of signal events is s = 1, and one makes the optimistic assumption that the background is completely negligible (b = 0), then the probability of obtaining at least one event is about 63.2%. Figure 3.5 shows the value of τ /Br(p → νK + ), as a function of the runtime, that would give various other probabilities of obtaining at least one event, again with the very optimistic assumption of absolutely no background b = 0 and = 46% [66]. Each of these choices for P (n ≥ 1) is equivalent to a requirement on the signal s, as labeled in the figure.
Ref. [82] also provided a preliminary estimate for the background and signal efficiency for proton decay search in p → e + π 0 mode at DUNE. Although DUNE is most sensitive to p → νK + mode, for completeness, we will also show our expected reach estimates for proton partial lifetime in p → e + π 0 mode at DUNE after 10 years and 20 years of runtime in our summary plots in Figures 4.1 and 4.2 in the Outlook section below.
We now turn to projections for JUNO with 20 ktons of a liquid scintillator. We again obtain the upper limit on the signal using eq. (2.19) for exclusion reach, and the signal needed for discovery using eq. (2.21) for discovery reach, then applying eq. (3.4). Figure 3.6 shows the proton lifetime in p → νK + decay channel that is expected to be excluded at 90% or 95% CL (top panel) or discovered at Z = 3 or Z = 5 significance (bottom panel) at For comparison, our estimates of the current 95%, 90%, 68%, and 50% CL exclusion limit on proton partial lifetime, based on Super-Kamiokande's data from 2014 [44], are shown as horizontal dashed lines. For projected exclusion sensitivities, both DUNE [66] and JUNO [47] experiments made use of the Feldman-Cousins (FC) method [62] to obtain the upper limit on the signal assuming a fixed number of observed events, e.g., n = 0. This approach can be problematic for projections because the FC upper limits at a fixed n decrease with b (as can be seen from Figure 2.3), and for projections it can imply that the expected sensitivity of the experiment gets better if the background increases. Also considered in ref. [66] is the usage of the FC method with n = b. For integer values of b, the FC upper limit with n = b sensibly increases as the background increases. But for non-integer b, n is still an integer, and the FC upper limit with n = round(b) does not always increase with b, as shown above in Figure 2 [85,86] combinations of background rates per year and signal selection efficiencies (b/year, ), as labeled. Our estimates of the current 95%, 90%, 68%, and 50% CL exclusion limit on proton partial lifetime, based on Super-Kamiokande's data from 2014 [44], are shown as horizontal dashed lines.  [48], for p → νK + and p → e + π 0 decay modes. The last column gives a brief description of each of the channels referring to the name of the search method used in ref. [48]. Exposure in each channel for a 186 kton Hyper-Kamiokande is given by λ i = 0.186 Mton × runtime in years. We next turn to projections for Hyper-Kamiokande. Figures 3.7 and 3.8 show our estimates for the proton partial lifetimes in p → νK + and p → e + π 0 decay channels, respectively, that are expected to be excluded at 90% or 95% CL (top panels) or discovered at Z = 3 or Z = 5 significance (bottom panels), as a function of runtime at Hyper-Kamiokande. In order to obtain the exclusion and discovery reaches for τ p , the upper limit on partial width and the partial width needed for discovery are solved from eqs. (3.10) and (3.11), respectively. These equations are used to combine the independent search channels in each decay mode, based on the background means and the signal selection efficiencies, along with their uncertainties, given in ref. [48] and summarized in our Table 3.2. Figures 3.7 and 3.8 also show our previously discussed estimates of the current exclusion limits at 95%, 90%, 68%, 50% CL in p → νK + and p → e + π 0 decay modes based on the data from refs. [44] and [45], respectively.
Finally, we turn to projections for THEIA. In Figures 3.9 and 3.10, we show the expected reaches, as a function of runtime, for proton partial lifetime in p → νK + and p → e + π 0 decay modes, respectively, for 90% or 95% CL exclusion (top panels) and discovery at Z = 3 or Z = 5 significance (bottom panels  Table 3.2, taken from ref. [48]. Our estimates of the current 95%, 90%, 68%, and 50% CL exclusion limit on proton partial lifetime, based on Super-Kamiokande's data from 2020 [45], are shown as horizontal dashed lines.  4)] for proton partial lifetime in p → e + π 0 with THEIA-25 (red lines) and THEIA-100 (blue lines) with 17 and 80 ktons of water based liquid scintillator, respectively, as a function of runtime. The estimates for the background (per megaton-year of exposure) and the signal efficiencies are taken from ref. [79]. Our estimates of the current 95%, 90%, 68%, and 50% CL exclusion limit on proton partial lifetime, based on Super-Kamiokande's data from 2020 [45], are shown as horizontal dashed lines.
the background rate per megaton-year of exposure and the signal selection efficiency for the decays modes p → νK + and p → e + π 0 are taken from refs. [49] and [79], respectively. As before, we also show our estimates for the current lower limits at various confidence levels based on the data from Super-Kamiokande [44,45].

IV. OUTLOOK
We summarize our projections for future proton decay searches in the final states e + π 0 and νK + at DUNE, JUNO, and Hyper-Kamiokande in Figure 4.1 for exclusion (assuming the signal is indeed absent), and in Figure 4.2 for discovery (assuming the signal is actually present). And in Figure 4.3 we summarize our projections at THEIA for 95% CL exclusion and Z = 5 discovery for various fiducial masses N kton = (10, 25, 50, 100) kton. In each case, we show results for 10 years and 20 years of runtime. The assumed backgrounds and signal efficiencies for DUNE † , JUNO, and THEIA in each proton decay mode are labeled in the plots, while the corresponding information for the multi-channel Hyper-Kamiokande searches was given in Table 3.2 above, quoted from ref. [48]. The vertical dashed lines correspond to our estimate of the current 90% CL (Fig. 4 Fig. 4.3) lower limit on proton partial lifetime in the respective decay channels, based on the published Super-Kamiokande data [44,45].
As noted above, our projections here are based on the exact Asimov evaluation of the Bayesian statistics CL excl and CL disc . Our results are somewhat more conservative than previous projections appearing in refs. [48] and the Snowmass report [79], which we have generalized to include 90% CL exclusion and Z = 3 evidence reach estimates as a function of runtime (for various estimates of backgrounds and signal efficiencies, notably for DUNE and JUNO) as well as estimates for 95% CL exclusion and Z = 5 discovery. In the cases of single-channel searches for DUNE, JUNO, and THEIA, we have also investigated the use of the exact Asimov frequentist p-value measures p excl and p disc . These results are not shown in the figures; we find that they are only slightly less conservative than the estimates shown.
The two panels of Figure 4.1 show the projected exclusion reaches at 90% and 95% confidence level, while the two panels of Figure 4.2 give the projected reaches for Z = 3 evidence and Z = 5 discovery at DUNE, JUNO, and Hyper-Kamiokande. And the top (bottom) panel of Figure 4.3 shows the projected 95% CL exclusion (Z = 5 discovery) reaches at THEIA with various fiducial masses of the detector material. As expected, for each planned experiment the reaches for exclusion are substantially higher than the corresponding reaches for a possible discovery. We note that the prospects for a definitive Z = 5 discovery † For projections at DUNE in p → νK + channel, we are using the optimistic choices based on ref. [80].
More pessimistic choices from refs. [81][82][83][84] will of course lead to lower reach estimates. 95% CL exclusion reach for 10+10 years runtime Current 95% CL limit p → e + π 0 p → νK + FIG. 4.1: Expected exclusion reaches at 90% CL (top panel) and 95% CL (bottom panel) for proton partial lifetime in p → e + π 0 (blue bars) and p → νK + (red bars) decay channels at JUNO, DUNE, and Hyper-Kamiokande after 10 years (darker shading) and 20 years (lighter shading) of runtime. The assumed backgrounds and signal efficiencies for JUNO and DUNE are labeled in the plots, and for Hyper-Kamiokande, the corresponding information is given in Table 3.2, quoted from ref. [48]. These results are based on preliminary estimates of the backgrounds and signal efficiencies, which are likely to change as the experiments progress, and therefore should be viewed with some caution as comparisons. The vertical dashed lines are our estimates of the current 90% CL (top panel) and 95% CL (bottom panel) lower limits based on Super-Kamiokande's data from 2014 [44] and 2020 [45].
are particularly modest after one takes into account the limits already obtained by Super-Kamiokande.
The results shown in Figures 4.1, 4.2, and 4.3 are preliminary estimates, as the presently available background and signal efficiency estimates vary significantly in their reliability, and more robust estimates will become available only when the experiments are closer to collecting data. For the same reason, the results should be viewed with some caution as a  ) for proton partial lifetime in p → e + π 0 (blue bars) and p → νK + (red bars) decay channels, at JUNO, DUNE, and Hyper-Kamiokande after 10 years (darker shading) and 20 years (lighter shading) of runtime. The assumed backgrounds and signal efficiencies for JUNO and DUNE are labeled in the plots, and for Hyper-Kamiokande, the corresponding information is given in Table 3.2, quoted from ref. [48]. These results are based on preliminary estimates of the backgrounds and signal efficiencies, which are likely to change as the experiments progress, and therefore should be viewed with some caution as comparisons. The vertical dashed lines are our estimates of the current 90% CL lower limits based on Super-Kamiokande's data from 2014 [44] and 2020 [45].
direct comparison of the different experiments, which are at very different stages of planning and development.
Proton decay experiments prior to Super-Kamiokande have ruled out the the simplest variations of minimal SU (5) GUT [5], and Super-Kamiokande has seemingly ruled out the minimal supersymmetric SU (5) GUT [18][19][20][21] with sfermion masses less than around the TeV scale. However there are many other well-motivated GUT models that predict proton partial lifetimes well beyond the current lower limits (see summary tables in refs. [77,79] ) for proton partial lifetime in p → e + π 0 (blue bars) and p → νK + (red bars) decay channels with various fiducial masses, as labeled, after 10 years (darker shading) and 20 years (lighter shading) of runtime. The assumed background rates and signal efficiencies for THEIA are labeled in the plots. These results are based on preliminary estimates of the backgrounds and signal efficiencies, which are likely to change as the experiment progresses. The vertical dashed lines are our estimates of the current 95% CL (top panel) and 90% CL (bottom panel) lower limits based on Super-Kamiokande's data from 2014 [44] and 2020 [45]. and references therein). For example, non-supersymmetric GUTs such as some minimally extended SU (5) models [7,8] and minimal SO(10) model [14] predict p → e + π 0 to be the dominant decay mode with partial lifetimes of order 10 32 −10 36 years and 5×10 35 years, respectively. Supersymmetric SU (5) GUTs predict the proton partial lifetime for the leading mode p → νK + to be 3 × 10 34 − 2 × 10 35 years in minimal supergravity framework (MSUGRA) and 3 × 10 34 − 10 36 years in supergravity models with non-universal gaugino masses (NUSUGRA), as discussed in ref. [26] in the light of the observed Higgs mass. Ref [22] revisited the minimal super-symmetric SU (5) GUT and obtained τ p /Br(p → νK + ) (2 − 6) × 10 34 years assuming universality of the soft supersymmetry breaking parameters at the GUT scale with sfermion masses less than around O(10) TeV. There are also supersymmetric GUTs such as the split SU (5) supersymmetry [31] and flipped SU (5) supersymmetric GUTs [28][29][30], where the dominant decay mode can be p → e + π 0 with lifetimes of order 10 35 − 10 37 years.
From our estimates of the reaches summarized in Figures 4.1, 4.2, and 4.3, we can see that DUNE, JUNO, Hyper-Kamiokande, and THEIA can probe a significant fraction of the parameter space of various presently viable supersymmetric and non-supersymmetric GUTs and could eventually lead the way to a more complete theory.
The existing code repository Zstats [67] is updated with various statistical measures of significance for counting experiments with multiple independent search channels as investigated in this paper. The updates include the significances based on our proposed Bayesian-motivated measures CL disc and CL excl , and their application to study the statistical significances for proton decay at current and future neutrino detectors. To demonstrate the usage of the code, the repository also contains some code snippets in a Python notebook that generate the data in each of the figures in this paper.