Search for dark photons produced in 13 TeV $pp$ collisions

Searches are performed for both prompt-like and long-lived dark photons, $A^{\prime}$, produced in proton-proton collisions at a center-of-mass energy of 13 TeV, using $A^{\prime}\to\mu^+\mu^-$ decays and a data sample corresponding to an integrated luminosity of 1.6 fb$^{-1}$ collected with the LHCb detector. The prompt-like $A^{\prime}$ search covers the mass range from near the dimuon threshold up to 70 GeV, while the long-lived $A^{\prime}$ search is restricted to the low-mass region $214<m(A^{\prime})<350$ MeV. No evidence for a signal is found, and 90% confidence level exclusion limits are placed on the $\gamma$-$A^{\prime}$ kinetic-mixing strength. The constraints placed on prompt-like dark photons are the most stringent to date for the mass range $10.6<m(A^{\prime})<70$ GeV, and are comparable to the best existing limits for $m(A^{\prime})<0.5$ GeV. The search for long-lived dark photons is the first to achieve sensitivity using a displaced-vertex signature.

The possibility that dark matter particles may interact via unknown forces, felt only feebly by Standard Model (SM) particles, has motivated substantial effort to search for dark-sector forces (see Ref. [1] for a review). A compelling dark-force scenario involves a massive dark photon, A , whose coupling to the electromagnetic current is suppressed relative to that of the ordinary photon, γ, by a factor of ε. In the minimal model, the dark photon does not couple directly to charged SM particles; however, a coupling may arise via kinetic mixing between the SM hypercharge and A field strength tensors [2][3][4][5][6][7]. This mixing provides a potential portal through which dark photons may be produced if kinematically allowed. If the kinetic mixing arises due to processes whose amplitudes involve one or two loops containing high-mass particles, perhaps even at the Planck scale, then 10 −12 ε 2 10 −4 is expected [1]. Fully exploring this few-loop range of kinetic-mixing strength is an important goal of dark-sector physics.
A dark photon produced in proton-proton, pp, collisions via γ-A mixing inherits the production mechanisms of an off-shell photon with m(γ * ) = m(A ); therefore, both the production and decay kinematics of the A → µ + µ − and γ * → µ + µ − processes are identical. Furthermore, the expected A → µ + µ − signal yield is given by [52] where n γ * ob [m(A )] is the observed prompt γ * → µ + µ − yield in a small ±∆m window around m(A ), the function F[m(A )] includes phase-space and other known factors, and A γ * [m(A ), τ (A )] is the ratio of the A → µ + µ − and γ * → µ + µ − detection efficiencies, which depends on the A lifetime, τ (A ). If A decays to invisible final states are negligible, then τ (A ) ∝ [m(A )ε 2 ] −1 and A → µ + µ − decays can potentially be reconstructed as displaced from the primary pp vertex (PV) when the product m(A )ε 2 is small. When τ (A ) is small compared to the experimental resolution, A → µ + µ − decays are reconstructed as prompt-like and are experimentally indistinguishable from prompt γ * → µ + µ − production, This facilitates a fully data-driven search and the cancelation of most experimental systematic effects, since the observed A → µ + µ − yields, n A ob [m(A )], can be normalized to n A ex [m(A ), ε 2 ] to obtain constraints on ε 2 . This Letter presents searches for both prompt-like and long-lived dark photons produced in pp collisions at a center-of-mass energy of 13 TeV, using A → µ + µ − decays and a data sample corresponding to an integrated luminosity of 1.6 fb −1 collected with the LHCb detector in 2016. The prompt-like A search is performed from near the dimuon threshold up to 70 GeV, above which the m(µ + µ − ) spectrum is dominated by the Z boson. The long-lived A search is restricted to the mass range 214 < m(A ) < 350 MeV, where the data sample potentially provides sensitivity.
prompt-like sample p T (µ) > 1 GeV, p(µ) > 20 GeV Figure 1: Prompt-like mass spectrum, where the categorization of the data as prompt µ + µ − , µ Q µ Q , and hh + hµ Q is determined using the fits described in the text.
The prompt-like A search strategy involves determining the observed A → µ + µ − yields from fits to the m(µ + µ − ) spectrum, and normalizing them using Eq. 1 to obtain constraints on ε 2 . To determine n γ * ob [m(A )] for use in Eq. 1, binned extended maximum likelihood fits are performed using the dimuon vertex-fit quality, χ 2 VF (µ + µ − ), and min[χ 2 IP (µ ± )] distributions, where χ 2 IP (µ) is defined as the difference in χ 2 VF (PV) when the PV is reconstructed with and without the muon track. The χ 2 VF (µ + µ − ) and min[χ 2 IP (µ ± )] fits are performed independently at each mass, with the mean of the n γ * ob [m(A )] results used as the nominal value and half the difference assigned as a systematic uncertainty.
Both fit quantities are built from features that approximately follow χ 2 probability density functions (PDFs) with minimal mass dependence. The prompt-dimuon PDFs are taken directly from data at m(J/ψ ) and m(Z), where prompt resonances are dominant (see Fig. 1). Small p T -dependent corrections are applied to obtain the PDFs at all other masses. These PDFs are validated near threshold, at m(φ), and at m(Υ (1S)), where the data predominantly consist of prompt dimuons. The sum of the hh and hµ Q contributions, which each involve misidentified prompt hadrons, is determined using same-sign µ ± µ ± candidates that satisfy all of the prompt-like criteria. A correction is applied to the observed µ ± µ ± yield at each mass to account for the difference in the production rates of π + π − and π ± π ± , since double misidentified π + π − pairs are the dominant source of the hh background. This correction, which is derived using a prompt-like dipion data sample weighted by p T -dependent muon-misidentification probabilities, is as large as a factor of two near m(ρ) but negligible for m(µ + µ − ) 2 GeV. The PDFs for the µ Q µ Q background, which involves muon pairs produced in Q-hadron decays that occur displaced from the PV, are obtained from simulation. These muons are rarely produced at the same spatial point unless the decay chain involves charmonium. Example min[χ 2 IP (µ ± )] fit results are provided in Ref. [61], while Fig. 1 shows the resulting data categorizations. Finally, the n γ * ob [m(A )] yields are corrected for bin migration due to bremsstrahlung, and the small expected Bethe-Heitler contribution is subtracted [52].
The prompt-like mass spectrum is scanned in steps of σ[m(µ + µ − )]/2 searching for A → µ + µ − contributions. At each mass, a binned extended maximum likelihood fit is performed using all prompt-like candidates in a ±12.5σ[m(µ + µ − )] window around m(A ). The profile likelihood is used to determine the p-value and the confidence interval for  n A ob [m(A )], from which an upper limit at 90% confidence level (CL) is obtained. The signal PDFs are determined using a combination of simulated A → µ + µ − decays and the widths of the large resonance peaks observed in the data. The strategy proposed in Ref. [65] is used to select the background model and assign its uncertainty. This method takes as input a large set of potential background components, which here includes all Legendre modes up to tenth order and dedicated terms for known resonances, and then performs a data-driven model-selection process whose uncertainty is included in the profile likelihood following Ref. [66]. More details about the fits, including discussion on peaking backgrounds, are provided in Ref. [61]. The most significant excess is 3.3σ at m(A ) ≈ 5.8 GeV, corresponding to a p-value of 38% after accounting for the trials factor due to the number of prompt-like signal hypotheses.
Regions of the [m(A ), ε 2 ] parameter space where the upper limit on n A ob [m(A )] is less than n A ex [m(A ), ε 2 ] are excluded at 90% CL. Figure 2 shows that the constraints placed on prompt-like dark photons are comparable to the best existing limits below 0.5 GeV, and are the most stringent for 10.6 < m(A ) < 70 GeV. In the latter mass range, a nonnegligible model-dependent mixing with the Z boson introduces additional kinetic-mixing parameters altering Eq. 1; however, the expanded A model space is highly constrained by precision electroweak measurements. This search adopts the parameter values suggested in Refs. [67,68]. The LHCb detector response is found to be independent of which quark-annihilation process produces the dark photon above 10 GeV, making it easy to recast the results in Fig. 2 for other models.
For the long-lived dark photon search, the stringent criteria applied in the trigger make contamination from prompt muon candidates negligible. The dominant background contributions to the long-lived A search are as follows: photon conversions to µ + µ − in the silicon-strip vertex detector (the VELO) that surrounds the pp interaction region [69]; b-hadron decays where two muons are produced in the decay chain; and the low-mass tail from K 0 S → π + π − decays, where both pions are misidentified as muons. Additional sources of background are negligible, e.g. kaon and hyperon decays, and Q-hadron decays producing a muon and a hadron that is misidentified as a muon.
Photon conversions in the VELO dominate the long-lived data sample at low masses. A new method, which is described in detail in Ref. [70], was recently developed for identifying particles created in secondary interactions with the VELO material. A high-precision three-dimensional material map was produced from a data sample of secondary hadronic interactions. Using this material map, along with properties of the A → µ + µ − decay vertex and muon tracks, a p-value is assigned to the photon-conversion hypothesis for each long-lived A → µ + µ − candidate. A mass-dependent requirement is applied to these p-values that reduces the expected photon-conversion yields to a negligible level. A characteristic signature of muons produced in b-hadron decays is the presence of additional displaced tracks. Events are rejected if they are selected by the inclusive Q-hadron software trigger [71] independently of the presence of the A → µ + µ − candidate. Furthermore, two boosted decision tree (BDT) classifiers, originally developed for studying B 0 (s) → µ + µ − decays [72], are used to identify other tracks in the event that are consistent with having originated from the same b-hadron decay as the signal muon candidates. The requirements placed on the BDT responses, which are optimized using a data sample of K 0 S decays as a signal proxy, reject 70% of the b-hadron background at a cost of about 10% loss in signal efficiency.
As in the prompt-like A search, the normalization is based on Eq. 1; however, in is not unity, in part because the efficiency depends on the decay time, t. Furthermore, the looser kinematic, muon-identification, and hardware-trigger requirements applied to long-lived A → µ + µ − candidates, cf. prompt-like candidates, increase the efficiency by a factor of 7 to 10, ignoring t-dependent effects. These m(A )-dependent factors are determined using a small control data sample of dimuon candidates consistent with originating from the PV, but otherwise satisfying the long-lived criteria. A relative 10% systematic uncertainty is assigned to the long-lived A → µ + µ − normalization due to background contamination in the control sample.
The fact that the kinematics are identical for A → µ + µ − and prompt γ * → µ + µ − decays for m(A ) = m(γ * ) enables the t dependence of the signal efficiency to be determined using a data-driven approach. For each value of [m(A ), τ (A )], prompt γ * → µ + µ − candidates in the control data sample near m(A ) are resampled many times as long-lived A → µ + µ − decays, and all t-dependent properties, e.g. min[χ 2 IP (µ ± )], are recalculated based on the resampled decay-vertex locations. This approach is validated in simulation by using prompt A → µ + µ − decays to predict the properties of long-lived A → µ + µ − decays, and based on these studies a 2% systematic uncertainty is assigned to the signal efficiencies.
values integrated over t are provided in Ref. [61]. A scan is again performed in discrete steps of σ[m(µ + µ − )]/2 looking for A → µ + µ − contributions; however, in this case, discrete steps in τ (A ) are also considered. Binned extended maximum likelihood fits are performed using all long-lived candidates and the three-dimensional feature space of m(µ + µ − ), t, and the consistency of the decay topology as quantified in the decay-fit χ 2 DF , which has three degrees of freedom (the data distribution is provided in Ref. [61]). The expected conversion contribution is derived in each bin from the number of candidates rejected by the conversion criterion. Two large control data samples are used to develop and validate the modeling of the b-hadron and K 0 2.0σ. More details about these fits are provided in Ref. [61].
Under the assumption that A decays to invisible final states are negligible, there is a fixed (and known) relationship between τ (A ) and ε 2 at each mass [52]; therefore, the upper limits on n In summary, searches are performed for both prompt-like and long-lived dark photons produced in pp collisions at a center-of-mass energy of 13 TeV, using A → µ + µ − decays and a data sample corresponding to an integrated luminosity of 1.6 fb −1 collected with the LHCb detector during 2016. The prompt-like A search covers the mass range from near the dimuon threshold up to 70 GeV, while the long-lived A search is restricted to the low-mass region 214 < m(A ) < 350 MeV. No evidence for a signal is found, and 90% CL exclusion regions are set on the γ-A kinetic-mixing strength. The constraints placed on prompt-like dark photons are the most stringent to date for the mass range 10.6 < m(A ) < 70 GeV, and are comparable to the best existing limits for m(A ) < 0.5 GeV. The search for long-lived dark photons is the first to achieve sensitivity using a displaced-vertex signature.
These results demonstrate the unique sensitivity of the LHCb experiment to dark photons, even using a data sample collected with a trigger that is inefficient for low-mass A → µ + µ − decays. Using knowledge gained from this analysis, the software-trigger efficiency for low-mass dark photons has been significantly improved for 2017 data taking. Looking forward to Run 3, the planned increase in luminosity and removal of the hardwaretrigger stage should increase the number of expected A → µ + µ − decays in the low-mass region by a factor of O(100-1000) compared to the 2016 data sample.

Prompt-Like Fits
The fit strategy denoted by aic-o and described in detail in Ref. [65] is used in the promptlike A search. The m(µ + µ − ) spectrum is scanned in steps of σ[m(µ + µ − )]/2 searching for A → µ + µ − contributions. At each mass, a binned extended maximum likelihood fit is performed, and the profile likelihood is used to determine the p-value and the confidence interval on n A ob [m(A )]. The prompt-like-search trials factor is obtained using pseudoexperiments. As in Ref. [65], each fit is performed in a ±12.5σ[m(µ + µ − )] window around the scan-mass value using bins with widths of σ[m(µ + µ − )]/20. Near threshold, the quantity q(µ + µ − ) ≡ m(µ + µ − ) 2 − 4m(µ) 2 is used instead of the mass since it is easier to model. The confidence intervals are defined using the bounded likelihood approach, which involves taking ∆ log L relative to zero signal, rather than the best-fit value, if the best-fit signal value is negative. This approach enforces that only physical (nonnegative) upper limits are placed on n A ob [m(A )], and prevents defining exclusion regions that are much better than the experimental sensitivity in cases where a large deficit in the background yield is observed.
The signal models are determined at each m(A ) using a combination of simulated A → µ + µ − decays and the widths of the large resonance peaks that are clearly visible in the data. The background models are chosen following the method of Ref. [65]. This method takes as input a large set of potential background components, then performs a data-driven model-selection process whose uncertainty is included in the profile likelihood following Ref. [66]. In this analysis, the set of possible background components includes all Legendre modes with ≤ 10 at every m(A ). Additionally, dedicated background components are included to model the near-threshold turn-on behavior and all sizable known resonance contributions.
The use of 11 Legendre modes adequately describes every double-misidentified peaking background that contributes at a significant level, e.g., φ → K + K − and D → K ± π ∓ double misidentified as dimuons, and in the D case misreconstructed as prompt-like, do not require dedicated background components. In mass regions where such complexity is not required, the data-driven model-selection procedure reduces the complexity which increases the sensitivity to a potential signal contribution. As in Ref. [65], all fit regions are transformed onto the interval [−1, 1], where the scan m(A ) value maps to zero. After such a transformation, the signal model is (approximately) an even function; therefore, odd Legendre modes are orthogonal to the signal component, which means that the presence of odd modes has minimal impact on the variance of n A ob [m(A )]. In the prompt-like fits, all odd Legendre modes up to ninth order are included in every background model, while only a subset of the even modes is selected for inclusion in each fit.
Regions in the mass spectrum where large known resonance contributions are observed are vetoed in the prompt-like A search. Furthermore, the regions near the η meson and the excited Υ states (beyond the Υ (4S) meson) are treated specially. For example, since it is not possible to distinguish between A → µ + µ − and η → µ + µ − contributions at m(η ), the p-values near this mass are ignored. Any excess at m(η ) is treated as signal when setting the limits on n A ob [m(A )], which is conservative in that a η → µ + µ − contribution will weaken the constraints on A → µ + µ − decays. The same strategy is used near the excited Υ masses. The treatment of all mass regions is summarized in Table 1.

Long-Lived Fits
The long-lived signal yields are determined from binned extended maximum likelihood fits performed on all long-lived A → µ + µ − candidates using the three-dimensional feature space of the dimuon invariant mass, m(µ + µ − ), the A decay time, t, and the decay-fit quality, χ 2 DF . As in the prompt-like A search, a scan is performed in discrete steps of σ[m(µ + µ − )]/2; however, in this case, discrete steps in τ (A ) are also considered. The profile likelihood is again used to obtain the p-values and the confidence intervals on The binning scheme involves four bins in χ 2 DF : [0,2], [2,4], [4,6], and [6,8] [3,5], [5,10], and > 10 ps. The binning scheme used for m(µ + µ − ) depends on the scan m(A ) value, and is chosen such that the majority of the signal falls into a single bin. Signal decays mostly have small χ 2 DF values, with about 50% (80%) of A → µ + µ − decays satisfying χ 2 DF < 2 (4). Background from b-hadron decays populates the small t region and is roughly uniformly distributed in χ 2 DF , whereas background from K 0 S decays is signal-like in χ 2 DF and roughly uniformly distributed in t. Figure 4 shows the three-dimensional distribution of all long-lived A → µ + µ − candidates. The expected contribution in each bin from photon conversions is derived from the number of candidates rejected by the conversion criterion. As discussed in the Letter, two large control data samples are used to develop and validate the modeling of the b-hadron and K 0 S contributions. Both contributions are well modeled by the function where q 0 , a, and b are fitted to the data, and Θ denotes the Heaviside step function. While no evidence for t or χ 2 DF dependence is observed for these parameters in either the b-hadron or K 0 S control sample, all parameters are allowed to vary independently in each [t, χ 2 DF ] region in the fits used in the long-lived A search. Figure 6 shows the long-lived A → µ + µ − candidates, along with the pull values obtained from fits performed to the data where no signal contributions are included. All of the pulls are in the range [−2, 2]. N.b., due to the fact that the background threshold parameters are free to vary in each [t, χ 2 DF ] region, the lowest-mass nonempty bin for each [t, χ 2 DF ] is biased towards a positive pull in the absence of a signal contribution.