Dark Matter Substructure under the Electron Scattering Lamppost

We study the mutual relationship between dark matter-electron scattering experiments and possible new dark matter substructure nearby hinted by the Gaia data. We show how kinematic substructure could affect the average and modulation spectra of dark matter-electron scattering in semiconductors, and the discovery reaches of future experiments with these targets. Conversely, we demonstrate how future data could probe and constrain the substructure dark matter fraction, even when it constitutes a sub-dominant component of the local dark matter density.


I. INTRODUCTION
The search for dark matter (DM) is one of the main goals for both experimental and theoretical physics. Among different search strategies, direct detection experiments are some of the most ambitious. In recent years various collaborations have made significant progress in constraining the parameter space for DM interactions with the particles of the standard model (SM). Traditional nuclear recoil experiments [1][2][3][4][5][6] are pushing the limits on the DM interaction cross section towards the neutrino floor for heavy DM particles, whereas electron recoil experiments [7][8][9][10][11][12][13][14] are probing increasingly smaller DM masses. For a comparison of different targets for sub-GeV DM direct detection, see [15].
It is well known that an accurate understanding of direct detection results depends crucially on the characteristics of the local DM distribution. Indeed, besides the local number density of DM particles, kinematic quantities such as DM mean velocity and its velocity dispersion are critical to the hope for a discovery at direct detection experiments.
The standard halo model (SHM), in which the velocities of the DM particles follow an isotropic Maxwellian distribution, has commonly been used in the computation of DM direct detection rates. Recent work, however, has shown that the Milky Way's history has been punctuated by mergers with dwarf galaxies, which resulted in a rich variety of stellar substructure beyond the traditional halo and disk, such as the debris flow of the Gaia Sausage (also called Enceladus) [16,17], the Nyx stream [18,19], or the so called "shards", the S1, S2a, and S2b streams [20][21][22]. 1 Since dwarf galaxies also contain DM, * jatan buch@brown.edu † manuel buen-abad@brown.edu ‡ jiji fan@brown.edu § shing chau leung@alumni.brown.edu 1 We caution the reader that the analysis extracting the Gaia Sausage substructure was performed in a region of the sky (within galactocentric radii of 7.5-10 kpc and |z| > 2.5 kpc), slightly different from the analysis for the Nyx stream (within radii 6.5-9.5 kpc and |z| < 2 kpc).
these mergers could also result in associated dark matter substructure within our galaxy, beyond that of what pertains to the SHM.
The impact that these new astrophysical discoveries could have on DM direct detection searches has only started to be explored recently [21][22][23][24][25][26]. In this paper, we focus on the effects that DM substructure hinted by these discoveries could have on electron recoil experiments with semiconductor targets. In particular, we are interested in how the differential DM-electron scattering rate depends on DM velocity distributions beyond the SHM, 2 as well as in the hitherto unexplored possibility of using the differential recoil rate to deduce the astrophysical properties of possible DM substructure components (see Ref. [34] for an analysis similar in spirit to ours but with a different objective). A particularly interesting feature of direct detection searches that DM substructure can affect is the seasonal variation of a DM signal [35]. The presence of DM substructure could produce an annual modulation signatures differing in both amplitude and phase from what is expected for DM in the SHM [36,37].
In order to achieve these goals, we need to define the DM velocity distributions we will be using in our analysis. Besides the SHM, we will also consider the halo and Sausage distributions as inferred in Refs. [16,17]. Indeed, there seems to be good evidence that low-metallicity stars originating from older mergers, such as those in the Gaia Sausage, are good kinematic tracers of DM and thus allow for their associated DM distribution to be determined up to uncertainties in the subtructure fraction [17]. 3 However, the correlation between the stellar streams, which arise from more recent mergers, and their associated DM is currently the object of some debate, and is far from being completely understood. Lacking an accurate description of how DM associated with the stellar streams is distributed, we will restrict ourselves to taking the velocity distributions of the stellar streams as benchmarks for their dark matter counterparts. To this end, we use the distributions for the Nyx [18], S1, S2a, and S2b [22] stellar streams. Since we are interested in exploring the relationship between astrophysics and direct detection, we take the stellar streams as mere proxies or placeholders for their associated DM distributions and make no claims about their accuracy as such.
The paper is organized as follows. We revisit the formalism of the DM-electron scattering rate for semiconductor targets, describe the astrophysical setup we consider, and develop an intuition for the impact of various DM substructure components in Sec. II. Then we will present our statistical analysis in Sec. III, and discuss our results on the discovery reaches assuming different DM velocity distributions as well as how future DM-electron experiments could probe fractions of substructure components in Sec. IV. We state our conclusions and the outlook of our work in Sec. V. We include further details in four appendices.

II. DARK MATTER ELECTRON RECOIL RATE
In this section, we first review the basic formalism for dark matter-electron (DM-e) scattering in semiconductors, which could be skipped by readers who are familiar with the subject. We then present a novel description of the effects that the DM velocity distributions from various substructures have on this type of scattering.

A. Formalism
The differential scattering rate of DM particles χ off the electrons in a semiconducting target material, or spectrum for brevity, is given by [29][30][31][32]: where R is the event rate per unit mass; E is the total energy transferred to the electron; N cell is the number of cells per unit mass of target material; ρ χ ≈ 0.4 GeV/cm 3 and m χ are the local DM energy density [41,42] and the DM mass respectively; σ e parameterizes the DM-e coupling; 4 t is the time of the year, and α is the QED coupling constant. The κ(E, t) factor is a "correction" factor that takes into account the particular properties of the semiconducting target, the local DM velocity distribution, and the momentum dependence of the DM-e 4 σe corresponds exactly to the free elastic scattering cross section in the heavy meadiator case.
interactions. It is given by [29][30][31][32]: with µ χ,e the DM-e reduced mass; q the momentum transfer, F DM (q) ≡ αme q n the DM form factor, which parameterizes the momentum dependence of the DM-e scattering; |f crystal (q, E)| 2 the crystal form factor, which describes the response of the semiconductor material to be probed with momentum q and energy E; and g(v min , t) the mean inverse speed of those DM particles with speeds above the minimum v min required to scatter off the target. 5 The time dependence arises from the annual modulation of the DM wind in the lab frame, due to the Earth's motion around the Sun. 6 v min can be found from energy conservation, and is given by [32]: An useful benchmark for Eq. (1), given that the number of semiconductor cells per kg is N cell ∼ 1×10 25 (4×10 24 ) for silicon (germanium), is obtained by taking 20 MeV DM with σ e = 10 −38 cm 2 , which yields a rate of 2 × 10 4 (5 × 10 3 ) events per kg · year, with κ in Eq. (2) giving a ∼ O(1) number for typical values of E.

B. Astrophysics setup
Eqs. (1) and (2) show that the energy and time dependence of the spectrum arises from both the response of the semiconductor to the scattering process and from the DM velocity distribution, encoded in f crystal (q, E) and g(v min (q, E), t) respectively. The latter is given by: with f ( v; ζ, v lab (t)) the normalized distribution of the DM velocity v in the lab frame, cut off at the galactic escape velocity which is taken to be v esc = 528 km/s in the galactic rest frame [44]. ζ are the parameters describing the velocity distribution of the DM components contributing to the local DM. Furtheremore, v lab (t) is the velocity of the lab with respect to the galactic rest frame, v ≡ | v| is the DM speed, F (v, t) the speed distribution, and Θ(x) the Heaviside step function. The velocity of the lab's frame is given by v lab (t) = v + V ⊕ (t), with v the Sun's velocity in the galactic rest frame, and V ⊕ the Earth's velocity in the heliocentric frame. For a detailed list of the numerical values of these astrophysical parameters, see Appendix A.
As discussed in the Introduction (Sec. I), stellar substructures due to past mergers have been discovered recently. As a consequence, DM substructures associated with these mergers could contribute to the local DM density, beyond the SHM. These dark substructures will have their own velocity distributions, which will contribute to Eqs. (4) and (5) as different component terms: where η i is the fraction of the local dark matter that comes from the i-th component present. The purpose of the rest of this section is to study the impact of these different velocity distributions on g(v min , t), and consequently on the scattering spectrum.
Let us consider a given DM component contributing to the local DM density, with velocity distribution f ( v; ζ, v lab (t)) and corresponding speed distribution F (v, t). 7 The effect it will have on g(v min , t) can be heuristically understood in terms of three quantities: the value of v at which the component's F (v, t) peaks, which we call the component's most probable speed; the time of the year at which the DM wind coming from this component is at its largest, called the component's characteristic time t c ; and the coplanarity b = sin λ, where λ is the angle between the DM wind and the normal to the Earth's orbital plane: b = 0 (1) when the wind is orthogonal (parallel) to the plane [23,36]. Note that since F (v, t) is time-dependent, the most probable speed is actually a function of the time of the year as well. Therefore, for convenience we define the most probable speed v mp as that which maximizes the yearly average F (v) 8 of the component's speed distribution.
In Table I we list v mp , t c , and b for possible local DM components we consider throughout this work: the SHM, Gaia's halo and Sausage [16], and possible DM streams associated with Nyx [18], S1, S2a, and S2b [22] stellar streams. Note that we use the values of the stellar streams as benchmarks for the potential dark matter distributions associated with Nyx, S1, S2a, and S2b. We want to remind the reader again of the caveats which are already 7 For brevity we drop the component index i. Whether we refer to the total sum of all the components or only to the specific contribution of one of them will be clear from the context. 8 We denote with a bar the yearly average of a time dependent quantity: f (x) = 1 year year 0 dt f (x, t). mentioned in Sec. I: we do not claim that these are the true distributions for DM substructures associated with the stellar streams, but only as mere proxies for them. The correlation between stellar streams and their corresponding DM distributions is currently the subject of extensive study. For more details on these DM distributions as well as on how to compute v mp , t c , and b, see Appendices A and B.

C. Effects of astrophysics on scattering spectrum
Having established the astrophysical setup that we will use throughout this paper, we now devote ourselves to the study of its impact on the DM-e scattering spectrum. We will divide our study into two parts: one dealing with the effects of the most probable speed v mp on the spectrum's yearly average, and another with the effects of the characteristic time t c and the coplanarity b on the annual modulation.

Average Spectrum: the impact of vmp
Let us begin by considering the impact different DM components have, through their most probable speeds v mp 's, on the yearly average g(v min ). Then we will proceed to consider v mp 's effects on the scattering spectrum via κ in Eq. (2).
The left plot in Fig. 1 shows F (v), normalized to its peak F (v mp ), for three example distributions: the SHM in purple, the Nyx stream in blue, and the S1 stream in red. The vertical dotted lines signal the most probable speeds v mp for each distribution. It can be seen that Nyx, a prograde stream, has a low v mp ; whereas S1, a retrograde stream, has a larger one. This is due to the Sun's relative motion with respect to the galactic rest frame.
It is evident from Eq. (4) that g(v min , t) (as well as its yearly average, g(v min )) is a monotonically decreasing function of v min . The right panel in Fig. 1  FIG. 1: Left: Yearly average of the DM speed distribution F (v) for the SHM (purple) and Nyx (blue) and S1 (red) streams, normalized to their maximums respectively. Right: Yearly average of the mean inverse speed g(v min ) for the SHM and the Nyx and S1 streams. The vertical dotted lines represent v mp of each distribution.
for the same distributions as before, as well as dotted vertical lines for v min = v mp . From the left panel in Fig. 1, we can see that the integrand in Eq. 4 has most of its support for values of v min < v mp , which results in an approximate plateau at smaller v min 's for g(v min ). However, for v min > v mp , Eq. (4) integrates over a diminishing portion of F (v), which results in the tail of g(v min ). We can then speak of a "width" for g(v min ), given by v mp . Indeed, comparing S1 with Nyx, we see how g S1 (v min ) has support over a wider range of v min 's than g Nyx (v min ), since v mp,S1 > v mp,Nyx .
Notice that the maximum height of g(v min ), given by g(0) at v min = 0, is inversely correlated with its width. The reason is that g(0) is the mean inverse speed of the distribution, which can be related to v mp as follows: where · · · represents the integral over the speed distribution. Thus, since v mp,S1 > v mp,Nyx , we have g Nyx (0) > g S1 (0).
Having described the particularities shown by g(v min ) for components of different v mp 's, we now focus on their consequences for κ, through the (q, E) dependence of v min in Eq. (3). We also need to inspect the interplay of different factors making up the integrand in Eq. (2), which involve not only g(v min ) but also the crystal form factor.
The left panel of Fig. 2 shows the contours of the silicon form factor |f Si (q, E)| 2 as a function of the transferred momentum q and deposited energy E. 9 Note that the 9 The crystal form factors for silicon and germanium, are taken from the publically available tables in ddldm.physics.sunysb.edu/ddlDM/, which were computed with the QEdark module [32] of Quantum Espresso [45,46].
form factor is at its largest around the typical values of momentum transfer in scattering off electrons: q ∼ few×αm e , as well as for energies of order E ∼ few×10 eV. We also present the curves for which v min (q, E) = v mp for SHM, Nyx, and S1, for DM mass m χ = 20 MeV. To the left of these curves (low E) lies the plateau of their corresponding g(v min ); whereas to their right (high E) lies its tail. Therefore κ, and thus the spectrum, decays at large E. We could also see from the plot that S1 stream could probe region with larger E with sizable |f Si | 2 while Nyx stream could only probe region with smaller E where |f Si | 2 is suppressed.
From Eq. (3), we observe that increasing m χ allows for a larger region of (q, E) space to yield sizable values of g(v min ). We show the effects of varying DM mass in the right panel of Fig. 2. It shows g(v min (q * , E)) at constant q * = 2αm e , for SHM, Nyx, and S1, and normalized to its largest SHM value: g SHM (0). Also plotted in this panel is the crystal form factors for silicon scaled by 1/10, at the fixed momentum transferred q * . We consider a family of curves with different DM masses between 10 and 100 MeV, with decreasing opacity for larger masses. From the plot, one could see that as expected, when m χ increases, g(v min (q * , E)) has support over larger energies, where the crystal form factors increase. Thus, either larger DM masses or components with larger v mp 's allow for scattering events to occur at larger energies.
In this paper, we focus on silicon target and similar results could be obtained for germanium target as well.

Annual Modulation: the impact of tc and b
We now consider the time-dependence of the spectrum. As mentioned before, the combined velocities of the Sun around the Milky Way and of the DM particles in a given component result in a "DM wind" in the Sun's 10 MeV 100 MeV Contours of the silicon form factor |f Si (q, E)| 2 (green) as a function of (q, E). Also included are the curves in (q, E) space for which v min (q, E) = v mp for SHM (purple), Nyx (blue), and S1 (red), according to Eq. (3) with m χ = 20 MeV. The shaded region corresponds to v min (q, E) above the galactic escape velocity (taken to be v esc = 528 km/s in the galactic rest frame), for which there cannot be any scattering events. Right: g(v min (q * , E)) at a constant q * = 2αm e , plotted as a function of E for SHM, Nyx, and S1; and normalized to g SHM (0). Decreasing opacity corresponds to increasing DM mass, between 10 MeV and 100 MeV. Also plotted in dashed green is the crystal form factor |f Si (q, E)| 2 for silicon.
frame of reference. Since the Earth performs one revolution around the Sun in a year, in the Earth's frame this DM wind displays an annual modulation, which will yield an increase or decrease in the expected number of DM-e events, depending on whether the Earth moves against or with the DM wind, respectively. Throughout the rest of this section, we define modulation as As mentioned in Sec. II B, a given DM component will have a characteristic time t c . At t c , the most probable speed of the DM particles is at its highest it will be all year; six months later it will be at its lowest. In addition, the component's DM wind will have a coplanarity b with the Earth's orbital plane: maximal coplanarity (b = 1) will result in the annual modulation having its maximum amplitude, whereas no coplanarity (b = 0) will result in no modulation at all. Thus, the quantities t c and b of a given component determine the phase and amplitude of the modulation of the associated DM wind.
We can use Eq. (6) to understand the effects t c and b have on g(v min , t). As is explained in more detail in Appendix B, at t c , the relative velocity of the DM wind in the Earth's frame will be at its largest, resulting in a most probable speed given by v mp + bV ⊕ , with V ⊕ the Earth's orbital speed. Six months later the velocity will be at its minimum, given by v mp − bV ⊕ . From Eq. (6) we then arrive at the following expression for the fractional amplitude A of the modulation δg(0, t): The reader should keep in mind that Eq. (7) is only an useful approximation for the magnitude of the modulation amplitude, and strictly speaking, it is valid only for v min = 0. The negative sign arises from the fact that the height of the plateau for g(v min , t), g(0, t c ), is inversely correlated with the most probable speed, so when the latter is at its maximum at t c , g(0, t c ) is at its minimum. The left panel of Fig. 3 illustrates this behavior for SHM, Nyx, and S1. Also plotted is g(v min , t c +6 months), which displays the opposite behavior: the plateau is at its maximum. In addition, the tail of g(v min , t) also modulates, but its modulation is out of phase with the plateau's. The reason is that, as discussed in the previous subsection II C 1, the most probable speed is correlated with the width of g(v min , t). Thus, when this speed is at its largest at t c , g(v min , t c ) is at its widest and it has support for more values of v min . In summary, plateau and tail of g(v min , t) present an annual modulation in opposite ways. The right panel of Fig. 3 further illustrates this by showing the contours of the modulation δg(v min , t) for S1, and marking the time t = t c and v min = v mp . Indeed one could see two opposite modulation phases for v min 500 km/s and v min 500 km/s.
Finally, since v min depends on E (Eq. (3)), we expect this same behavior to be displayed by the scattering spec- , and S1 (red). Right: Example of annual modulation δg(v min , t) for S1. The dotted line marks v min = v mp,S1 , while the dashed line marks t = t c,S1 . trum itself, for low and high values of the energy respectively, in accordance with the right panel of Fig. 2. However, there is a subtlety here. For a light DM and a component with a small v mp such as the Nyx stream, v min needed for the scattering above the experimental threshold could always lie in the tail of g(v min , t). In this case, we won't observe phase flipping at low and high energies and there will only be one phase observed. For more details, see Appendix C.

D. Summary
So far we have described how different DM components, characterized by most probable speed v mp , characteristic time t c , and coplanarity b, affect the mean inverse speed g(v min , t) and consequently the scattering spectrum dR/dE. We have used SHM, Nyx stream, and S1 stream as examples; yet our findings apply to other DM components, and can be summarized as follows: • v mp determines the width of g(v min , t), as well as the energies E at which the we expect most scattering events.
• v mp is inversely correlated with the height of the plateau of g(v min , t), and consequently the number of events at the lowest energies. 10 The modulation amplitude is also inversely proportional to v mp .
• t c determines the phase of the annual modulation of both g(v min , t) and the scattering spectrum: the 10 If DM mass is very small, the energy region associated with plateau of g(v min , t) may not be kinematically available to the scattering process and only the tail is observed.
time of the year at which the plateau (tail) is minimized (maximized).
• b determines the amplitude of the annual modulation.
In an experiment such as SENSEI [9,11] or EDEL-WEISS [10] the scattering spectrum is observed not as a continuum in E, but as a function of the number Q of electron-hole pairs detected in the semiconductor, also called the ionization level of the semiconductor. A very simple map between E and Q is given by [32]: where E gap is the band-gap energy of the semiconductor and ε the mean energy per electron-hole pair. E gap = 1.2 eV and ε = 3.8 eV for silicon, while E gap = 0.67 eV and ε = 2.9 eV for germanium.
Experimentally there is then a natural binning of the E axis in terms of Q. In Fig. 4 we show the yearly average of this binned energy spectra for m χ = 20 MeV and 1 GeV, for the extreme cases where 100% of the local DM comes from SHM, Nyx, or S1 components. Binning the t axis as well, for example in months, we can describe the scattering spectrum, for different combinations of DM and astrophysical parameters, as the expected number of events in a time-energy bin (t i , Q j ). Fig. 5 shows the Q-month binned scattering spectrum off silicon for DM of mass m χ = 20 MeV, n = 0 (F DM = 1), and σ e = 10 −37 cm 2 per kg − year of exposure, for 100% Nyx or S1 components. Both Fig. 4 and Fig. 5 confirm the relations we list above.  assuming that all DM particles coming from the Nyx stream (left), or the S1 stream (right). The numbers indicate the expected number of events in that bin, while the colors correspond to the annual modulation and indicate whether the numbers are above or below the yearly average.

III. ANALYSIS SETUP
So far we have only focused on the phenomenology of DM-e scattering with semiconductor targets for several DM substructures. At this stage, we can pose a couple of legitimate questions: a) what are the prospects of detecting a DM signal with next-generation electron recoil experiments for a given DM velocity distribution? b) assuming discovery, can we distinguish the effects of DM substructures such as streams in a statistically significant way? While part a) has received attention in the literature, most notably in the pioneering work by Refs. [31,32] (see also ref. [33] for a recent exploration), a detailed examination of part b) in context of DM-e scattering exper-iments, to the best of our knowledge, has not been performed yet. With that in mind, we outline a statistical analysis in this section that uses forecasting as a complementary probe of DM substructure properties alongside the usual discovery reach contours.
However, an important complication in detecting DM and estimating its properties with electron recoil data is the existence of experimental background due to "dark current" events. Frustratingly, the background, along with the DM signal for a large range of masses and DM form factors, peaks at low ionization Q bins [9,11,12]. Given that, we also consider a more optimistic possibility of mitigating a large background component through sideband measurements, which can be achieved for re-   [7] in units of events/kg/day. The OPT model has an ionization threshold Q th = 3 (hence no entries for R bkg Q=1,2 ) to mitigate the background through sideband measurements. Following [47], the constant background rate is taken to be uniform across all bins.
coil data with higher thresholds Q th . 11 We also propose the use of time domain data to probe the characteristic modulating component of a potential DM signal. Since we can reasonably assume the background to be timeindependent, the time domain channel can essentially be treated as background-free for downstream analysis after DM discovery. Although we will derive our results in this paper assuming an idealized SENSEI-like [7] experiment outlined in Table II, the formalism described here may easily be extended to experiments with different targets.
Thus, in the rest of the section, we develop a profile likelihood analysis applicable to next-generation electron recoil experiments with two main goals: • estimate the discovery reach for: i) a DM signal in presence of a realistic experimental background, and ii) an annually modulating DM signal over the average recoil spectrum by leveraging full time domain data.
• constrain the DM fraction in substructure by distinguishing signals from various astrophysical configurations.
For ease of reference, we summarize the important details of our statistical analysis in Table III.
Statistical test DM parameters Relevant eqs.
DM signal discoveryσe, mχ (9), (10), (12) Modulation discoveryσe, mχ (9), (12), (14) Sensitivity forecasts η, mχ (9), (10), (15)  We define the likelihood function (henceforth simply the likelihood) for a hypothetical experiment which detects electron-hole pairs produced from a DM-e recoil event in both ionization and time bins, where the product is over both time (n t ) and ionization (n Q ) bins respectively. The likelihood L b (N |ψ) in general (we note an exception later in the section) is given by the Poisson probability distribution, such that N obs and N th are the number of observed and predicted events in the i th time and j th ionization bin. For brevity, we have dropped all constant terms from the expression above. The predicted events in each bin consist of the signal rate S ij (θ, λ), the background rate B ij along with an overall normalization given by the exposure E. Assuming a linear model, we have The signal is evaluated for each DM model, ψ = (θ, λ ), where θ and λ are the signal and nuisance parameters respectively. The identification of a model parameter as a signal or nuisance parameter is determined by the nature of the analysis. 12 Lastly, we note that the definitions above can be trivially extended to the average spectrum case by dropping all t bins and using only the Q bins with the recoil rate derived for the mean Earth velocity. In experimental terms, this is equivalent to working with all the data collected through the duration of the experiment in each Q bin.
Before formally defining a test statistic (TS) for each of the goals we stated above, we note that in the absence of real data, we use the Asimov data set [48] for estimating the median sensitivity of an idealized next-generation experiment. In practice, the Asimov data set is simply the mock signal (plus background wherever applicable) corresponding to the parameter values of a chosen benchmark point with no statistical fluctuations.
The discovery reach for an experiment is expressed through the likelihood ratio TS [49], 12 For example, we treat the fraction of DM in substructure as a nuisance parameter while estimating the discovery reach in σe − mχ space, but as a signal parameter while constraining DM's astrophysical properties. Furthermore, for the case of an unknown background, B should also be treated as a nuisance parameter.
where D Asm ≡ N Asm (θ, λ ) such that (θ, λ ) correspond to the signal and nuisance parameter values respectively for the chosen benchmark point. The ratio can be shown to have a χ 2 distribution with 1 degree of freedom in the asymptotic limit [48,50], whereas the significance Z of the detection is then expressed in terms of the inverse CDF of the normal distribution Φ(x) for a given p-value, For the case of DM discovery, Eq. (12) can be evaluated in a straightforward fashion by substituting the Poisson likelihood for the average recoil spectrum given by Eq. (10) into Eq. (9). However, for the discovery of an annually modulating signal, the likelihood must take into account that the modulation, defined as the bin-by-bin difference between the time domain and average rates, is not an experimental observable. Instead, we note that since both the rates are Poisson distributed individually, their difference follows the Skellam distribution given by, where N mod is the number of modulation events in the i th time and j th ionization bin and may be positive or negative, depending on the values of N tim and N avg , respectively the mean number of time domain and average events; I k (x) is the modified Bessel function of the first kind. Thus, for estimating the discovery reach of a modulating signal, we adopt the TS in Eq. (12) with the likelihood function defined in Eq. (14).
Meanwhile, we use sensitivity forecasts to illustrate an experiment's capability to distinguish signals from various DM substructure components. The TS, based on the pairwise comparison of neighboring parameter points, is defined as, where the Asimov data D Asm is defined analogous to that in the discovery reach case assuming the Poisson likelihood in Eq. (10). In frequentist statistical terms, the TS above is used to reject the null hypothesis that signals corresponding to θ 1 and θ 2 are indistinguishable at the (1−α) % confidence level (CL). We use the euclideanized signal (ES) method introduced by Refs. [51,52] for a fast, benchmark-free calculation of Eq. (15). For more details on how the ES method is implemented in context of direct detection experiments, we refer the reader to Refs. [26,53].

IV. RESULTS
In this section, we present the results of our statistical analysis in form of discovery reaches and sensitivity forecasts. We focus on the potential of an idealized SENSEIlike electron recoil experiment for DM discovery, and for probing the fraction of the local DM density, η, in kinematic substructure such as a stream and a debris flow using the observed spectrum. We use the velocity distributions described in Sec. II B as benchmarks for a phenomenological study, and treat DM mass, cross section, and DM substructure fraction(s) as free parameters depending on the analysis. All our results have been derived assuming a 1 kg-year exposure and background models summarized in Table II. In Appendix D, we include additional supplemental results.

A. Discovery reaches
From a statistical point of view, our Asimov discovery reaches indicate the median sensitivity of a SENSEI-like experiment to a DM signal. Before discussing the case of different DM substructure components, however, we study the basic characteristics of discovery contours for the vanilla SHM velocity distribution.
In the top row of Fig. 6, we plot two types of 5σ discovery reach contours for different DM form factors: one for detecting DM over a given background model (MAX or OPT), and the other for observing a modulating DM signal over the average recoil spectrum. Since the modulation fraction for DM-e scattering is O(1-10)%, we would naively expect the modulation sensitivity to be at least an order of magnitude weaker, i.e, it requires a larger cross section to achieve a similar significance of detection, than that of simply discovering DM. The presence of experimental backgrounds, however, leads to two important features. First, the DM discovery reach using the MAX model is significantly weaker than the one using the OPT model for m χ ∼ > 3 MeV. Second, for lower masses, m χ 5 MeV, an experiment is more likely to discover a modulation signal first. To explain these behaviors, we note that the recoil spectra for DM-e scattering peaks in the first couple of Q bins for masses m χ O(10) MeV, with the peak shifting to higher Q bins for increasing DM masses where the background rate is dominated by the constant rate component. At larger DM masses, the TS of discovery reach roughly has the usual √ B scaling. Since the constant rate in the MAX model is ∼ 10 3 greater than in the OPT model, we can see that the discovery reach for a general DM signal is worse by a factor of 30 in the MAX model than that in the OPT model. Meanwhile, at lower masses where Q = 1, 2 bins drive the discovery reach, the extremely high background rate in the MAX model and the higher threshold of the OPT model lead to considerably weaker reaches for a non-modulating signal Also shown for reference are the latest 90% CL upper limits from SENSEI [11] (cyan), XENON10 [54] (orange), and XENON1T [13] (gray). a Bottom row: Modulation discovery reaches for: (left) various fractions of Nyx and S1 streams represented by blue and red contours; (right) S2a and S2b streams along with the Gaia Sausage indicated by brown, green, and orange contours respectively. All the contours for DM substructure components assume that it constitutes 100% of the local DM density.
a Despite the more conservative bound shown by XENON1T in Fig. 5 of Ref. [13], we only show the constraints for events with ≥ 12 e − , in case of F DM = 1, because the charge yield for liquid xenon has never been measured below these energies. Meanwhile, for F DM ∝ 1/q 2 , the corresponding constraint of XENON1T lies above the cross section range of the plot.
for both background models, compared to the modulating one. The discovery contours follow a similar trend for F DM ∝ 1/q 2 , except that they are marginally stronger (weaker) at lower (higher) DM masses. Again, this is explained by the 'squeezing' of the recoil spectrum (more events in the peak, fewer in the tail) to lower Q bins with an enhanced crystal form factor at low q, due to the DM form factor.
Next, we investigate how the unique kinematic features of various DM substructure components discussed in Sec. II C will affect their detection prospects in the bottom row of Fig. 6. We focus on the discovery of a modulation signal since it can effectively be considered as a background-free channel. The most striking feature for all substructure components is the marked deviation of their modulation discovery reach compared to the SHM contour. 13 For example, relative to SHM, the reaches for Nyx, S2a, S2b, and Sausage, all weaken (strengthen) for low (high) DM masses with an inflection point at m χ ∼ 10 MeV. Upon closer scrutiny, we observe that Nyx, S2a, S2b have similar sensitivity at high DM masses whereas the Sausage contour is quite a bit weaker, while at the low mass end, amongst them the reaches follow the order (in increasing strength): S2a < Nyx < S2b < Sausage. These behaviors could be understood as follows: 14 i) the amplitude of modulation approximated by Eq. (7), which depends on v mp and the coplanarity b, is lower for SHM and Sausage relative to the streams (other than S1), leading to weakened sensitivities to these astrophysical components at higher mass; ii) since v min at low DM masses lies in the tail of the velocity distribution, substructure components with greater v mp and/or velocity dispersion σ v (see Table IV) have a stronger discovery reach, explaining the order of reaches at the low mass end. We can also apply these heuristics to understand qualitatively the behavior of the S1 stream as well. It has an increased sensitivity at low DM masses due to its high v mp with almost an order of magnitude improvement over SHM at m χ ∼ 2 MeV. But when combined with its small coplanarity b, the suppressed amplitude of modulation leads to a much weaker reach at high DM masses, compared to the other streams.
A major caveat for the discussion above is the implicit assumption that DM in each component constitutes 100% of the local DM density. We also show in the bottom left panel the discovery reaches for an astrophysical model where only 20% of the DM lies within Nyx or S1 streams and SHM contributes the remaining fraction. Although there appears to be very little difference in these discovery contours, as we argue in the following sections, DM-e recoil experiments could still play a significant role in constraining the DM fraction in streams if we assume the discovery of a signal.

B. Resolving DM substructure fraction
Using the average DM-e recoil spectrum in Eq. (15), we forecast the sensitivity of a next-generation SENSEI-like experiment to reconstruct the DM fraction η i in the i th DM substructure component. In particular, we consider how the resolution of η varies with DM mass in the case which contains only a single DM substructure component and a total velocity distribution given by In addition, we try to simultaneously constrain η's for components in a dual substructure model with the velocity distribution given by where the subscript i = 1, 2 refers to any two distinct DM substructure components. The real astrophysical composition of the local DM distribution could be more complicated, and the two cases we consider just serve as illustrative examples of how DM-e scattering experiments could probe DM substructures. For concreteness, we only discuss the results for F DM = 1.
In Fig. 7, we show the 68% CL sensitivity forecast contours using MAX and OPT background models for S1, Nyx, S2b, and Sausage substructure components. We find that the next generation SENSEI-like experiment with 1 kg-year exposure can narrow down DM substructure fractions as a function of m χ . In particular if we assume an optimistic background, we can localize η given a perfect knowledge of the DM substructure distribution. In a more realistic interpretation, when our knowledge of the DM velocity distribution is not perfect, we still expect a reasonable resolution of η at m χ 20 MeV, with either background model. Our result should still hold qualitatively in the situation when the velocity distribution of a dark stream does not perfectly correlate with the stellar one.
We also comment upon several general features of the degeneracy contours in Fig. 7. First, the experimental sensitivity progressively worsens in both m χ and η i directions when increasing the DM mass for all four components, whereas the resolution is fairly good for all streams at low DM masses even with the MAX background model. Broadly speaking, these features can be interpreted in terms of the DM mass and velocity dependence of the recoil rate. The overall scattering rate could be kept roughly similar when varying DM mass and substructure fraction simultaneously. Yet at larger DM masses above 20 MeV, spectra shapes are degenerate for a fairly large range of masses and cannot be fully broken by varying the substructure fraction. While for m χ 20 MeV, the spectral shape in high Q bins is quite sensitive to η, which strongly affects the number of DM particles in the tail of the velocity distribution as the substructure could have a quite different value of v mp with respect to the remaining SHM/halo component.
One may also notice the intriguing differences between the orientations of contours at the benchmark points with The remaining (1 − η i ) fraction of DM is constituted by the SHM and the smooth isotropic halo for streams and Sausage respectively. As in Fig. 6, the solid (dashed) contours denote the forecasts for the OPT (MAX) background model. The benchmark scattering cross section for all the substructure components isσ e = 10 −38 cm 2 , which is in the neighborhood of the 5σ modulation discovery reach for the benchmark masses considered here.
the same DM masses for S1 on one hand and for Nyx, S2b, and Sausage on the other hand. For the latter class of DM components, for a DM mass of 10 MeV, our sensitivity forecasts indicate a degeneracy between η and m χ with a preference for higher DM masses at higher DM substructure fractions, while the forecast for S1 is largely independent of η. At small masses around and below 10 MeV, v min is large for given (q, E) and thus it only probes the tail of g(v min ). Since Nyx, S2b, and Sausage all have a lower v mp compared to that of SHM, increasing their η at a fixed m χ leads to a smaller tail for g(v min ) of the combined velocity distribution. This effect is compensated by lowering v min and enhancing g(v min ) when increasing m χ . Conversely, S1's high v mp implies a much wider plateau as shown in Sec. II C 1, which allows for more scattering at low DM masses leading to a considerably better resolution. Note that for higher DM masses, the orientation of the contours is flipped, i.e when increasing η, the S1 contours tend toward higher values of m χ , whereas those for Nyx, S2b, and Sausage have the opposite behavior. At larger DM masses, where v min is sufficiently small, it is the height of the plateau in g(v min ) that dictates the orientation. As we note from Fig. 2, components with lower v mp such as Nyx have a higher plateau relative to those with a higher v mp such as SHM and S1. Thus, the rate will increase with an increase in η for Nyx, S2b and Sausage at a given large DM mass. This effect could be cancelled by increasing v min through decreasing m χ . Opposite arguments apply to the S1 stream.
Lastly, as was highlighted previously, the S1 stream peaks at higher Q bins relative to components with lower values of v mp . This creates a significant improvement in its resolution, especially at higher DM masses, when switching from the MAX to the OPT background model. There is also an improvement for other substructure components, but it is less striking.
So far our discussion has focused only on the toy scenario where the local DM astrophysical distribution consists of just a smooth halo along with one additional substructure component. Next, we explore the possibility of simultaneously constraining the substructure fractions of a local DM distribution with two additional components. Fig. 8 shows the results of our analysis through 68% CL sensitivity forecasts in η-η parameter space for m χ = 20 MeV. As we can see from Fig. 7, the electron recoil experiment has maximum sensitivity to the DM substructure fraction in the neighborhood of m χ = 20 MeV, implying that our forecasts in Fig. 8 are on the optimistic side. For three of our plots, we have fixed S1 as one of the components, combining it with Nyx, S2b, or Sausage. Since all components besides S1 have similar values of v mp , their degeneracy contours have similar characteristics. In particular, we note that the contours are very sensitive to changes in η S1 , while being highly degenerate in the fraction of the other component for the MAX background model. Moreover, there is a moderate improvement in the resolution when we use the OPT background. The sensitivity to η S1 can be clearly attributed to S1's distinct v mp , which is the highest among all components for any configuration. In the remaining plot between Nyx and S2a, we see that the substructure fractions are maximally degenerate for both background models, given the similar v mp values of both streams. This degeneracy for m χ ∼ > 5 MeV could be mitigated by reducing the back-ground in Q = 1, 2 bins, despite the difference in their v mp values being only ≈ 80 km/s.
While we take the mean values of stellar substructures as benchmarks of their DM counterparts, we do not include their uncertainties in our analysis. The magnitudes of these uncertainties are provided in Table IV. Incorporating them will not change our results much. As we have repeatedly emphasized, potentially the most important (systematic) uncertainty is the correlation between stellar and DM streams. We leave a more quantitative uncertainty analysis for the future, when the DM-stellar correlation in streams is better understood.
Finally, we comment on the cross section,σ e = 10 −38 cm 2 , used for obtaining the results discussed in this section. Since our benchmark cross section is in the neighborhood of the 5σ modulation discovery reach for all the benchmark masses we consider in Figs. 7 and 8, it is an intriguing prospect to see whether adding time domain information leads to better signal discrimination, or equivalently an improvement in the resolution of η. However, we have verified that for the different substructure combinations considered here, the effect of v mp always dominates those of t c and b, even when we assume a negligible background in the Q = 1, 2 bins. An alternative way to extract maximum information from time domain, particularly while using actual experimental data, is to consider correlations between time bins since these could be unique for each DM substructure component irrespective of their v mp value. Such a study is beyond the scope of our forecasting analysis and we defer it to a future work.

V. CONCLUSIONS AND OUTLOOK
The rich galactic dynamics of phase space substructure revealed by Gaia data suggests a potentially more complicated composition of DM around us beyond the simple description of SHM. This could have important implications for terrestrial DM probes, e.g., DM-e scattering experiment, which is the low-mass frontier of DM direct detection.
In this article, we first study how possible new substructure components could affect the observables of DMe scattering, both the time average recoil spectra and the yearly modulation. One could understand these effects through three quantities that characterize the substructure components: most probable speed v mp , characteristic time t c and coplanarity b. We then perform a likelihood-based analysis to demonstrate how the discovery reach of a future DM-e scattering experiment depends on different astrophysical DM models (see also a complementary study [55]). In particular, we show that given a discovery, DM-e scattering experiments could be sensitive to one or several DM substructure components, and will be able to constrain their corresponding fractions even when they are sub-dominant to the local DM density. This suggests an interesting opportunity to probe the astrophysical aspects of DM models using direct detection experiments.
The relationship between direct detection and local DM distributions is still an evolving subject -one that requires further study. In our study, we take the data of the stellar streams, identified using Gaia data, as proxies of their possible DM counterparts, and explore their potential effects at future DM-e scattering experiments. Yet the correlation between the stellar and associated DM substructure is not fully established. It is important to test and confirm the properties of DM substructure with further observations and numerical simulations.
Given our benchmarks, we find that DM-e experiments could probe and constrain fractions of DM in substructure, driven largely by their different v mp 's. It will be interesting to devise more sophisticated statistical analysis to take advantage of effects due to t c and b. Ideally, it will be fantastic to apply these methods on actual experimental data to probe the local DM distribution.
Acknowledgements We thank Lina Necib and Ciaran O'Hare for useful correspondence and for sharing their velocity distributions data files. We are also grateful to Tien-Tien Yu for offering critical comments on a preliminary draft of the manuscript. The results in this work were computed using the following open-source software: swordfish [52], IPython [56], matplotlib [57], scipy [58], and numpy [59]. JB, JF, JL, and M. B-A are supported by the DOE grant DE-SC-0010010 and NASA grant 80NSSC18K1010.

Appendix A: Astrophysical components
In this paper, we consider several possible astrophysical components in the solar neighborhood that have been discussed in the literature. We include the Gaia Sausage for tidal debris, a kinematic substructure resulting from older mergers which becomes well-mixed spatially and only manifests itself in the velocity distribution [60]. A distinctive feature of the Gaia Sausage is that there are two lobes in the radial velocity distribution, at v r = ±115.50 km/s. We also consider three stellar streams, which are kinematically cold substructures that are localized in both position and velocity space: i) Nyx, a prograde stream with ∼500 stars that slightly lags behind the MW disk [18]; ii) S1, a retrograde stream with 28 stars and a very high Earth-frame speed [21]; and iii) S2, a stream following a prograde orbit with a high vertical direction component. S2 has two constituents, S2a with 48 stars and S2b with 8 stars. S1 and S2 streams are two of the most prominent streams belonging to a group of substructures referred to as stellar shards [22].
Lastly, we include the halo component of the distributions. We use the SHM parameterized as in Ref. [36]. We also use, in conjunction with Gaia Sausage, the Gaia halo described in Ref. [16], which comes from the joint posterior for the halo component of the MW when modeling the Gaia Sausage substructure. Note that Gaia halo is different from SHM.
The full parameters of the velocity distributions of these components (in the galactic frame) are listed in Table IV. The mean values for the parameters with an  asterisk are obtained by fitting Gaussian distributions  to the kinematic data provided by Lina Necib, from Refs. [16,18]. Also included in this table are the uncertainties for all the parameters. The uncertainties in the parameters with an asterisk were derived by propagating the errors in the published literature to our fitted values [16,18]; while the uncertainties in the shards streams were provided by Ciaran O'Hare, from Ref. [22].
Since both tidal debris and streams are remnants of accretion from MW's surrounding satellites and subhalos, we expect that there are DM counterparts to the stellar components. The correlation between the velocity distributions of stellar components and their DM counterparts have been discussed for some substructures such as Gaia Sausage [17,38,61] and requires further investigation. While simulation shows that debris flow could be a good tracer of DM [17,61], the relation between DM and stellar components of streams is not established yet. Nonetheless, we use velocity distributions of stellar streams as tentative descriptions of their associated DM component. The purpose is to use these as benchmarks to show that DM substructure could have interesting distinctive effects on direct detection experiments while on the other hand, terrestrial DM experiments could probe astrophysical DM substructures. For a more precise description of DM substructure (especially DM streams), we need to wait for numerical simulations in the near future.
Based on the discussion above, we now write the velocity distribution f i (v) of the i-th astrophysical component in the Earth's (lab's) frame: where Σ i ≡ diag(σ 2 r , σ 2 φ , σ 2 z ) is the square of the velocity dispersion matrix, and µ i,lab (t) is the mean velocity of the DM wind boosted to the lab frame: with v is the Sun's velocity in the galactic rest frame, and V ⊕ (t) is the Earth's velocity in the heliocentric frame. Following [62][63][64], we take these to be: with ω = 2π/365.25 days −1 the Earth's angular speed, and V ⊕ = 29.79 km/s its orbital speed. 1,2 are two linearly independent vectors defining the Earth's circular orbit (ignoring its eccentricity) which, in the conventions of Refs. [65,66], point in the direction of the Earth's velocity during the vernal equinox (t Mar 21 = 79.26 days) and the summer solstice, respectively: Finally, Θ in Eq. (A1) is a Heaviside step function that cuts off the DM speed at escape velocity v esc , which throughout this paper we take to be 528 km/s [44]. Due to this velocity cut-off, we need a constant normalization factor N i,esc for each DM component.

Appendix B: Characteristic quantities
In section II B, we introduced the most probable speed v mp , characteristic time t c , and coplanarity b to describe all the DM components we have considered. We present their precise definitions below. (B1) For a DM component with a mean velocity µ lab (t) in the Earth's frame, given by Eq. (A2), the characteristic time is t c ≡ arg max t ( µ lab (t)). (B2) The coplanarity b is a measure of whether the DM wind lies in the same plane of the Earth's orbit. It is given by: andμ is the unit vector pointing in the direction of the DM wind in the heliocentric frame: The definition in Eq. (B3) is equivalent to sin λ with λ the angle between the normal of the Earth orbital plane andμ .
In this appendix, we include additional results that further illustrate the effects DM components could have on a SENSEI-like direct detection experiment. Fig. 10 shows the 5σ discovery reaches for DM with n = 0 (F DM = 1) scattering off electrons in a silicon-based experiment, for two background models, MAX and OPT, described in Table II. For each contour, we assume that 100% of local DM particles are drawn from the corresponding astrophysical component: SHM, Gaia Sausage, Nyx, S1, S2a, or S2b. Note that since these are discovery reaches for non-modulating DM signals, the characteristic time t c and the coplanarity b do not play a role.
In both background models, the high DM mass behavior of the reaches can be understood in terms of the v mp values of the distributions: the components with larger v mp 's relative to SHM present a stronger discovery reach (see Table I). Indeed, since at high DM masses v min is very small, the entire width of their respective g(v min ) is available for scattering, and as shown in Fig. 2, they receive a larger contribution from the crystal form factor at low energies. However, for those components with lower v mp 's, their higher g(v min ) relative to SHM is not enough to overcome the smaller form factor contribution, resulting in a weaker discovery reach. This effect is compounded by a low signal-to-noise ratio (no signal) in Q = 1, 2 bins, where the spectra for components with lower v mp 's peak, for the MAX (OPT) background model. Finally, we note that the relative strength of the OPT case compared to the MAX one is determined by the fact that the former has a lower constant background rate than the latter.
For low DM masses, v min becomes larger, thereby making only the tail of g(v min ) available for scattering. As a consequence, those distributions with both larger v mp (which allows for a wider of g(v min )) and velocity dispersion σ v have a stronger reach. This is more evident in the right panel of Fig. 10 corresponding to the OPT background model, since the ionization threshold Q th = 3 means that the narrow distributions with low v mp have to rely on their velocity dispersions σ v to produce any events.