Dark Matter Model or Mass, but Not Both: Assessing Near-Future Direct Searches with Benchmark-free Forecasting

Forecasting the signal discrimination power of dark matter (DM) searches is commonly limited to a set of arbitrary benchmark points. We introduce new methods for benchmark-free forecasting that instead allow an exhaustive exploration and visualization of the phenomenological distinctiveness of DM models, based on standard hypothesis testing. Using this method, we reassess the signal discrimination power of future liquid Xenon and Argon direct DM searches. We quantify the parameter regions where various non-relativistic effective operators, millicharged DM, and magnetic dipole DM can be discriminated, and where upper limits on the DM mass can be found. We find that including an Argon target substantially improves the prospects for reconstructing the DM properties. We also show that only in a small region with DM masses in the range 20-100 GeV and DM-nucleon cross sections a factor of a few below current bounds can near-future Xenon and Argon detectors discriminate both the DM-nucleon interaction and the DM mass simultaneously. In all other regions only one or the other can be obtained.

Introduction.-Searching for the elusive Dark Matter (DM) particle has been the preoccupation of physicists for many years [1,2]. Over the past decade, twophase scintillator direct detection experiments [3,4] have found much success with the LUX [5], XENON [6] and PANDA-X [7] collaborations providing the most stringent constraints on Weakly Interacting Massive Particles (WIMPs) in the GeV−TeV mass range to date. These experiments owe their success to their low-background nature and will continue to improve in sensitivity for many years to come. In the case of a DM detection, it should be possible to study the astro-and particle-physics properties of DM using a variety of detectors and detection methods (see for example Refs. [8][9][10][11][12] and many others), but the precise parameter regions in which these properties can actually be measured is often hard to quantify.
Exploring the prospects for discriminating between different DM-nucleon interactions usually relies on comparing a number of benchmark models [13][14][15][16][17][18][19]. However, the pair-wise comparison of different benchmark points in the model parameter space (be they DM couplings or masses) is time-consuming, does not scale well with the number of benchmark points, and is in particular problematic in high-dimensional parameter spaces. In direct detection, such a high-dimensional parameter space appears in the framework of non-relativistic effective field theory (NREFT) [20][21][22][23][24], in which the space of DM-nucleon interactions may have more than 30 dimensions and is therefore hard to explore [25][26][27]. With current techniques, it is hence difficult to study model degeneracies and the degeneracy breaking power of future instruments in a reliable and exhaustive way. For such tasks, dedi- * t.d.p.edwards@uva.nl † b.j.kavanagh@uva.nl ‡ c.weniger@uva.nl cated techniques are required.
In this Letter, we introduce a new framework for studying the signal discrimination power of future detectors in a fundamentally benchmark-free way. The key questions we aim to address are: How many observationally distinct signals does a given model predict for a set of future experiments? How many of these signals are compatible with specific subsets of the signal model? And more specifically, in which regions of the parameter space is signal discrimination and parameter reconstruction possible?
We first summarize the basics of our approach. We then discuss the experiments and dark matter models that we consider in the current work. Finally, we show our results and conclude with a short discussion about possible future directions and applications. 1 Information Geometry.-Consider a New-Physics model M with some d-dim model parameter space, θ ∈ Ω M ∈ R d , and a combination of future experiments X that are described by some likelihood function L X (D| θ), where D is data. We expect that two model parameter points θ, θ can be discriminated by experiments X if the parameter point θ is inconsistent with Asimov data D =D( θ). More concretely, distinctiveness requires that the log-likelihood ratio 1 All code for the calculations throughout this paper can be found at https://github.com/tedwards2412/benchmark_free_ forecasting/. exceeds a threshold value r α (M) 2 . The threshold value depends on the chosen statistical significance, which we set here to α = 0.045 (2σ), as well as the sampling distribution of TS( θ ). In the large-sample limit and under certain regularity conditions, the sampling distribution follows a χ 2 k distribution with k = d degrees of freedom [28,29].
The last part of Eq. (1) is an approximation based on the 'euclideanized signal' method [30], an embedding θ → x( θ) ∈ R n into some, usually higher-dimensional, space with unit Fisher information matrix (n usually equals the total number of data bins). This approximation maps statistical distinctiveness onto euclidean distances. Confidence regions in the model parameter space correspond then to hyperspheres of radius r α (M) in the euclideanized signal space.
Often one is interested in sub-models S that are nested inside model M, and which are obtained by restricting M to a d -dim subregion Ω S ⊂ Ω M . A parameter point θ of M leads to a signal that is distinct from any signal in submodel S if exceeds a certain threshold value r α (S, M) 2 . Here 'distinct' means that the composite null hypothesis S can be rejected for dataD( θ). The sampling distribution of Eq. (2) follows in general a χ 2 k=d−d distribution. In the euclideanized signal space x, parts of model M cannot be discriminated from model S that lie within a 'shell' of thickness r α (S, M) around the signal manifold of S.
Finally, nuisance parameters can be accounted for by replacing the likelihood function in Eq. (1) with a profile likelihood, L(D| θ) = max η L(D| θ, η)L η ( η), where the last factor can incorporate additional constraints on the nuisance parameters from data external to X.
Distinct signals.-To quantify the signal discrimination power of a set of future experiments X in the context of model M we may define the figure of merit Total number of signals from model M discriminable by experiments X.
More specifically, ν α M,X equals the maximum number of points that can populate the parameter space of M while remaining mutually distinct according to Eq. (1), assuming a significance level α. Any such set of points provides a complete sample of the characteristic phenomenological features of model M. Loosely speaking, the points correspond to a set of non-overlapping confidence contours as shown in Fig. 1. Furthermore, when considering a sub-model S nested in M, we can define Number of signals from model M discriminable by experiments X, and consistent with model S.
(3) With these definitions, the phenomenological distinctiveness of various regions in the parameter space of model M can be visualized using standard Venn diagrams 2 . The technical definition for the measure ν α M,X (·), which is used in the subsequent examples, is given in Appendix A.
DM-nucleon interactions.-While direct detection is typically analysed in terms of the standard spindependent (SD) and spin-independent (SI) interactions [31], the range of possible signals is much broader. The framework of non-relativistic effective field theory (NREFT), which was introduced and developed in Refs. [20][21][22][23][24], aims to classify possible elastic DM-nucleon interactions and thus possible signals in DM-nucleus scattering experiments. The NREFT is realised as a power series in the DM-nucleus relative velocity v and the nuclear recoil momentum q, valid for non-relativistic DM (v c) and short-range interactions. The resulting op- give rise to a range of novel energy spectra [14,16,19,32], directional signals [33,34] and annual modulation signatures [35,36] in direct detection experiments. We here focus on the three operators O 1 , O 4 and O 11 because they allow us to explore a diverse range of signals with only a small number of operators, leaving a more exhaustive exploration of the full NREFT parameter space for future studies using these techniques. The operator O 1 = 1 χ 1 N couples to nucleon number while the operator O 4 = S χ · S N couples to nuclear spin, allowing us to explore the complementarity 42    To the left/below each broken line, it is not possible to discriminate a O1-signal (with the indicated crosssection and DM mass) from the corresponding best-fit O4, O11 or magnetic dipole signal. Above/right of each broken line, such a discrimination is possible with at least 2σ significance. The statistical argument is based on conventional composite hypothesis testing (see text for details). Solid lines display the 90% CL limits for the XENON1T-2017 and XENONnT experiments. All lines account for DM halo uncertainties. Note that the region left/below the 'O11 -Xe + Ar' line corresponds to the overlapping (blue+orange) subsets in the right panel of Fig. 2, and the region above/right corresponds to the non-overlapping (blue-only) O1 region.
between nuclei of different size and spin [37]. Operator O 11 = i q · S χ /m N may arise as the leading-order interaction in certain scalar-mediated models [24]. Similar to O 1 , it also couples to nucleon number and receives a co- herent enhancement to the rate, but has a characteristic peaked recoil spectrum owing to an extra dσ/dE R ∝ E R scaling of the cross section [16]. This allows us to explore how well different recoil spectra can be discriminated in future experiments. Though the NREFT is a powerful framework, it cannot encompass all possible signals. In particular, in its original formulation [21] the NREFT cannot describe interactions which are mediated by light mediators. In this case, the typical momentum transfer is larger than the mediator mass and an expansion in q is no longer appropriate 3 . The scenario in which this mediator is the Standard Model photon has been studied extensively [38,40,41]. DM may have a tree-level renormalisable coupling to the photon, eχγ µ χ A µ , though its charge must be much smaller than one [42]. Such millicharged DM has long-ranged, coherently-enhanced interactions with nuclei, with a differential cross section scaling as 43,44]. Alternatively, DM may have non-zero electric and magnetic moments [45,46], particularly if it takes the form of a composite state, such as a Dark Baryon [47,48]. In the context of model discrimination, the more interesting interaction for us will be the DM magnetic dipole moment, (µ χ /2) χσ µν χ F µν , which leads to both long-range and short-range contributions to the rate, arising from charge-dipole and dipole-dipole interactions respectively [41,49,50].
The five DM-nucleon interaction model we have outlined above encompass a range of phenomenologicallydriven as well as more theoretically-motivated models, leading to a wide range of direct detection signals. We calculate the signal spectra in each case using the publicly available python code WIMpy 4 [51]. For the recoil rates, we use expressions from Ref. [23] and Ref. [38] 5 . The required nuclear response functions are taken from the mathematica package provided in Ref. [23], supplemented by those calculated in Ref. [52]. We assume isospin conserving (c p = c n ) NREFT interactions throughout.
We incorporate halo uncertainties [53] by assuming Gaussian likelihood distributions for three parameters of the Maxwellian velocity distribution of DM. We sample from these distributions as nuisance parameters in our signal calculation and include an additional penalization term to the TS in Eq. (1). The penalization term is simply the likelihood ratio which we treat as an additional contribution to the euclideanized signal, therefore taking the form ηi−ηi σ where η i are our nuisance parameters andη i is their mean value. For a large number of sampled points our technique exactly matches a profile log-likelihood analysis, as described above. We check this limit is saturated by reducing the number of sampled points where our results do not noticeably change. We include uncertainties on the Sun's speed v = (242 ± 10) km/s [54], on the local circular speed v c = (220 ± 18) km/s [55], and the Galactic escape speed v esc = (533 ± 54) km/s [56]. We assume that these uncertainties are uncorrelated (as in e.g. Ref. [57]), though 3 We note, however, that because the DM is still non-relativistic, the effects of light mediators can be incorporated into the NREFT by including the appropriate propagator [38,39]. 4 See https://github.com/bradkav/WIMpy_NREFT. 5 Note that the normalisation of the NREFT operators in Ref. [38] and Ref. [23] differ (typically by factors of m N and 4mχ). We use the normalisation of Ref. [23] and adapt the calculations of Ref. [38] for consistency.
in general correlations coming from the modelling of the Milky Way halo can be included [58,59]. Direct dark matter searches.-We implement two toy detectors, both designed to resemble the expected advancement in direct dark matter searches over the upcoming five to ten years. Xenon detectors have, in the past few years, displayed the highest sensitivity to WIMPs with O(10) GeV masses and heavier. In particular XENON1T [6] has placed the most stringent constraints, reaching a spin-independent cross section below 10 −46 cm 2 in their first observation run. We therefore implement a XENON-like detector in light of their forecast sensitivity gains, with multiple experiments on the horizon [60][61][62][63]. In addition we implement a detector containing a target material with no nuclear spin. In this way we maximize discriminability of spin-dependent operators. Our material of choice is Argon, for which we use the Darkside50 [64] detector specifications 6 . Our signal implementations and background assumptions are briefly described below.
XENONnT: Firstly, we implement a simplified XENON1T for which we adopt an S1-only analysis. We use the change of variables dR , as in [30], to transform our recoil energy spectrum to a distribution in the S1 plane. We approximate dE R dS1 as the red line in the bottom panel of Fig. 2 in [6] 7 . We use an exposure of 3.6 × 10 4 kg days, multiplied by an additional factor of 0.475 to account for the reduced detector volume when only considering signals in the reference region. We sum over different Xenon isotopes, weighting by their naturally-occuring mass fractions, as given in Ref. [65], and assume a signal efficiency as given in Fig. 1 of [6]. We use 19 bins linearly spaced between 3 and 70 PE (corresponding to nuclear recoil energies roughly in the range [5,40] keV). The number of bins was chosen for increased computational efficiency for Euclideanized Signal calculation with no noticeable loss in accuracy. We have checked that including a 20% energy resolution [66] and increasing the number of bins should have no substantial effect on our results.
Background distributions are described in Fig. 3 of Ref. [6] for which we assume 1% uncertainty on all components separately. For our future Xenon detector we scale up the observation time of XENON1T-2017 by a factor of 100 where the background rates are assumed to stay constant. This exposure roughly corresponds to that expected for the full run of the XENONnT experiment [62], so we will refer to this future detector as XENONnT.
DarkSide20k: We directly use the recoil spectrum dR/dE R as our signal, assuming that the only relevant isotope is Argon-40. The nuclear recoil efficiency is taken 6 Liquid noble detectors typically do not have sensitivity to DM particles lighter than a few GeV, so we restrict our attention to mχ > 10 GeV in the current work. 7 We therefore assume that the median expected value of S2 given S1 approximately describes the relevant recoil energy.
from Fig. 6 in Ref. [64]. The background is assumed to be flat across the entire energy range with an expected 0.1 events over 1422 kg days of observation. A 10% uncertainty is assumed on the flat background component. Again for computational efficiency we assume 19 linearly spaced bins between 32 and 200 keV. Our future detector corresponds to a year of exposure for a 20 tonne detector, giving 7.3 × 10 6 kg days of exposure. We assume that the background will remain at 0.1 events in the total exposure. This detector configuration roughly resembles DarkSide20k [67].
Results.-In Fig. 1, we show the expected 68% CL reconstruction regions for a set of mutually distinct parameter points, for our XENONnT -like detector. The confidence regions are constructed by querying spheres with radius r α (M) = 1.52 in the euclideanized signal space. The number of these regions corresponds approximately to the figure of merit in Eq. (3).
In the central panel of Fig. 2, we illustrate the power of XENONnT to discriminate between operators in the 3dimensional model space M of mass, O 1 , and O 11 . With increasing numbers of events, the number of discriminable signals increases, though the majority of signals are compatible with both O 1 and O 11 . In the right panel of Fig. 2, we show the same model discrimination exercise but now including DarkSide20k. In this case, there are large fractions of models which are consistent with either O 1 or O 11 , but not both. The addition of an Argon detector not only increases the number of discriminable signals (from 133 to 291) but also enlarges the region of parameter space where O 1 and O 11 can be discriminated from each other. The left panel of Fig. 2 corresponds to the same scenario as the right panel, instead summed over the number of signal events (in XENONnT).
In Fig. 3 we show the regions of the parameter space of spin-independent (O 1 ) DM in which discrimination from O 4 , O 11 and magnetic dipole DM would be possible. For O 4 (spin-dependent interactions), we see that 2σ discrimination is possible at high DM mass even down to small cross sections, when both Xenon and Argon experiments are used. The spin-zero Argon nucleus has no spin-dependent coupling, so we can discriminate well as long as the Argon detector has sensitivity (down to masses O(20 GeV), below which the majority of recoils are below the Argon energy threshold of 32 keV).
For O 11 , discrimination is possible at high mass because of the different spectral shapes of O 1 and O 11 , though cross sections larger than ∼ 10 −46 cm 2 are required to obtain enough events to map out the spectra accurately. At low mass, the peak in the O 11 spectrum falls below the energy threshold of the experiments; for both O 1 and O 11 the exponentially falling tail of the DM velocity distribution dominates the spectral shape [68], making discrimination impossible.
For Magnetic Dipole interactions, discrimination is also possible at high mass, given enough signal events. We note a 'kink' in the boundary for Magnetic Dipole interactions around m χ ∼ 50 GeV. For large DM masses, the short-range spin-dependent dipole-dipole component of the Magnetic Dipole interaction begins to dominate [41]. In this case, discrimination prospects are good with the inclusion of the spin-zero Argon detector.
We find that for the mock detectors we consider, SI interactions cannot be distinguished from Millicharged DM, which is not shown in Fig. 3. The recoil spectrum for Millicharged DM is similar to O 1 , but has an extra E −2 R suppression. However, this more rapidly falling recoil spectrum can be mimicked by an SI interaction with smaller DM mass. As demonstrated in Refs. [16,39], low-threshold semi-conductor experiments are required to distinguish between the two interactions.
Finally, in Fig. 4 we indicate, for various operators, the region of the parameter space where a closed contour for the DM mass (i.e. a discrimination from TeV masses) would be possible at the 2σ level. At large DM masses, the kinematics of the interaction mean that the recoil spectrum becomes roughly independent of the DM mass, meaning that to the right of the curves in Fig. 4, it is not possible to obtain an upper limit on m χ [69,70]. For O 1 we show the regions for Xenon only, as well as for Xenon and Argon combined without halo uncertainties to demonstrate the improvement in mass reconstruction when including a second detector. In addition, we show that when two detectors are combined halo uncertainties make little difference to the mass discrimination, as changes in the velocity distribution affect the spectra in the two detectors differently [8]. All lines apart from O 1 are mapped onto σ SI by converting to an effective cross section and rescaled to match O 1 at high masses. Operator O 4 contains the largest region in which the mass cannot be constrained due to the lack of signal in an Argon detector.
In general, we see that even in the most optimistic case of cross sections just below the current bounds, it is not possible to pin down the DM mass for masses larger than ∼ 100 GeV. Previous works have demonstrated, typically using a small number of benchmarks [16,70,71], that DM mass reconstruction worsens for large masses; here, we have mapped out precisely where this mass reconstruction fails as a function of mass and cross section.
Discussion.-The projection and approximation methods introduced in this paper allow us to efficiently characterize and visualize the phenomenological distinctiveness of direct dark matter signal models in infometric Venn diagrams, as shown in Fig. 2. Furthermore, these methods allow for an efficient exploration of the signal phenomenology of complex models, and hence allow us to make 'benchmark-free' statements like those shown in Figs. 3 and 4 that would otherwise require a large number of gridded Monte Carlo simulations that become computationally unfeasible in higher dimensions. Indeed, in Fig. 3 we see that ruling out non-standard interactions is harder for light Dark Matter, while in Fig. 4 we see that we cannot pin down the DM mass for masses larger than ∼ 100 GeV, in both cases depending on the interaction cross section. This leaves only a small region of param-eter space -for m χ ∈ [20, 100] GeV and cross sections a factor of a few below current bounds -in which the DM mass and interaction can both be constrained at the 2σ level with near-future Xenon and Argon detectors. Such general statements would not have been possible without an efficient exploration of the Dark Matter parameter space, made feasible with the tools presented here.
The method rests on the euclideanization step introduced in Ref. [30], and works for any Poisson (and hence also Gaussian) likelihood function, as long as background model uncertainties are sufficiently Gaussian. This encompasses, in principle, a large range of New-Physics models, ranging from cosmology to collider signatures. The method is non-Bayesian and puts focus on the expected discrimination power of future experiments in Frequentist terms, although Bayesian interpretations are also possible.
This work paves the way for a more complete exploration of the direct detection parameter space and a deeper understanding of the complementarity between detectors. Future work should explore the possibility to discriminate between a wider range of interactions, beyond the subset of five we include here. In addition, the techniques we present may be used to optimize detector properties (target material, thresholds, etc.) in order to understand how operator discrimination can be improved at low DM mass. As we have shown, using direct detection alone may not allow us to completely pin down the DM properties. Combining complementary information from direct and other searches (indirect, collider, etc.), coupled with new techniques for efficient forecasting, will provide essential guidance in the future of Dark Matter detection.
First, we generate a large number of points in the parameter space of model M, θ i ∈ Ω M (typically of the order 10 5 ). The details of the distribution of the points do not matter in the limit of a large number of parameter points. Each point is projected onto the corresponding euclideanized signal, x i = x( θ i ). This projection depends on the details of the detector. If multiple experiments are used, the corresponding vectors are concatenated. Euclideanized signals are generated using swordfish [30] (see paper for technical details). This process essentially provides a sample of the model parameter space M embedded in the (usually higher dimensional) euclideanized signal space. In addition, the mapping between these spaces is known if M is sufficiently sampled. This sample of parameter points and corresponding euclideanized signals are the basis for the various estimation techniques used in this work.
Confidence contours. Given the sample of projected points, x i , it is now straightforward to generate expected contours around any parameter θ 0 ∈ Ω M in the model parameter space. Such regions are for instance shown in Fig. 1. To this end, we first calculate the projected signal x 0 = x( θ 0 ). Then, using standard nearest neighbor finder algorithms [73], we identify the set of points θ i that are within a radius r α (M) of point θ 0 . Here, r α (M) depends on the dimensionality of the parameter space Ω M as well as the significance level of interest, r α (M) = χ 2 k=d,ISF (1 − α) where χ 2 k=d,ISF is the inverse survival function of the Chi-squared distribution with k = d degrees of freedom. For the 3-dim models that we consider in this paper, and a significance level of α = 0.045, we have for instance r α (M) = 2.84. Now, the points in the model parameter space Ω M that belong to the confidence region can be simply identified by backprojecting the nearest neighbors in euclideanized signal space (obviously this back-projecting just requires a lookup in the original list of model parameters). In this way, the generation of confidence regions around arbitrary signal points is efficiently achieved.
Number of distinct signals. For Fig. 2 we are interested in the (maximum) number of points that can populate the model parameter space Ω M such that the model points can be discriminated in the sense of Eq. (1). This is equivalent to finding the maximum number of points in the euclideanized signal space that can populate the embedding of Ω M such that their mutual distance is at least r α (M). We derive an estimate for this number with the following procedure: For each projected sample point x i , we calculate the number of nearest neighbors N i within a distance r α (M)/2. This can be rather efficiently done using standard clustering algorithms. Now, we assign a weight to point x i , which is simply given by w i = 1/N i . It hence corresponds to the 'fractional contribution' of a single parameter point to a confidence region. The number of distinguishable signals is given by the sum Here, c f f is a filling factor correction related to the packing density of hyperspheres in a d-dim parameter space. For the 3-dim models in which we are often interested, this number is given by c f f = 0.74 [74]. We find that this prescription provides an efficient and reliable way to estimate the number of distinct signals of a model. The main requirement in the calculation is that each of the potential confidence regions contains a sufficiently large (typically at least ten) number of samples. Adding more points to the original list would then not affect the result anymore. We tested the stability of our results by doubling the number of sampled points in various examples from the text. The results remained unchanged in the limit of ten points per distinct region.
Distinct signals compatible with H 0 . We are interested in the fraction of the distinct signal points that are consistent with a null hypothesis that is defined as a lower dimensional subspace of the full model parameter space, Ω S ∈ Ω M . Here, we call a point in Ω M 'consistent' with Ω S if the composite null hypothesis Ω S cannot be excluded against the alternative hypothesis Ω M . To estimate this number, we first generate a large number of points in Ω S . We then collect all points from the original sample of Ω M whose minimum distance to any of the points from Ω S is smaller than the threshold values r α (M, S). Here, the threshold is derived from a χ 2 k distribution with k degrees of freedom, where k is the difference in the dimensionality of Ω M and Ω S (for the examples in the paper, we usually have k = 1, and hence r α (M, S) = 2). The number of distinct signals that are compatible with the null hypothesis Ω S is then simply obtained by restricting the sum in Eq. (A1) to the points within the shell around Ω S .
Parameter ranges and nuisance parameters. Finally, the contours in Fig. 3 and Fig. 4 are generated by identifying all points that are consistent with the indicated null hypotheses, as described in the previous paragraph. However, in these figures we also take into account the effects of DM halo uncertainties, as described in the main text (this is not easily possible when calculating the number of dimensions). To this end, we generate for each point x i ∈ Ω M several euclideanized signals corresponding to various randomly selected DM halo configurations. Again, the specific distribution of these points does not matter as long as the number is large enough to sufficiently cover the various halo configurations. In order to incorporate external constraints on the DM halo parameters, we add an additional contribution to the euclideanized distance calculation, which is just given by (η i −η) 2 /σ 2 η , whereη and σ 2 η are the mean and variance of the nuisance parameter, and η i is the value of the nuisance parameter for a specific point i. As indicated in the main text the contribution to the x( θ) is a simple concatenation of (η i −η)/σ η .